谨慎的贝叶斯MPC：遗憾的分析和对不安全学习情节数量的界限

论文标题

谨慎的贝叶斯MPC：遗憾的分析和对不安全学习情节数量的界限

Cautious Bayesian MPC: Regret Analysis and Bounds on the Number of Unsafe Learning Episodes

论文作者

Wabersich, Kim P., Zeilinger, Melanie N.

论文摘要

本文研究了模型预测控制（MPC）概念和后抽样技术的组合，并提出了一种简单的约束收紧技术，以在探索性学习情节中引入谨慎。提供的理论分析以累积遗憾的方面着重于先前所述的“谨慎的贝叶斯MPC”算法的足够条件，并在线性MPC问题的情况下显示了未来奖励功能的Lipschitz连续性。在非线性MPC问题的情况下，这表明非线性MPC优化技术通常需要的假设为使用后采样的基于模型的RL提供了足够的标准。此外，结果表明，提出的约束收紧意味着使用软件限制的MPC公式在线性和非线性情况下预期的不安全学习发作的预期数量结合。使用数值示例说明了该方法的效率。

This paper investigates the combination of model predictive control (MPC) concepts and posterior sampling techniques and proposes a simple constraint tightening technique to introduce cautiousness during explorative learning episodes. The provided theoretical analysis in terms of cumulative regret focuses on previously stated sufficient conditions of the resulting `Cautious Bayesian MPC' algorithm and shows Lipschitz continuity of the future reward function in the case of linear MPC problems. In the case of nonlinear MPC problems, it is shown that commonly required assumptions for nonlinear MPC optimization techniques provide sufficient criteria for model-based RL using posterior sampling. Furthermore, it is shown that the proposed constraint tightening implies a bound on the expected number of unsafe learning episodes in the linear and nonlinear case using a soft-constrained MPC formulation. The efficiency of the method is illustrated using numerical examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题