论文标题
神经Lyapunov模型预测控制:从亚最佳示例中学习安全的全球控制器
Neural Lyapunov Model Predictive Control: Learning Safe Global Controllers from Sub-optimal Examples
论文作者
论文摘要
随着对数据驱动控制技术的兴趣日益增长,模型预测控制(MPC)提供了一个机会,可以可靠地利用数据的盈余,尤其是在考虑安全性和稳定性的同时。在许多现实世界和工业应用中,通常具有现有的控制策略,例如人类操作员的执行。这项工作的目的是通过学习保留安全性和稳定性的新控制器来改善这一未知,安全但次优政策。从数据和系统限制的知识中,学习如何安全。所提出的算法可替代地学习终端成本,并根据稳定性度量更新MPC参数。终端成本被构建为Lyapunov功能神经网络,目的是使用短预测范围恢复或扩展初始演示器的稳定区域。提出了表征模型不确定性和功能近似值引起的次数均匀性的稳定性和性能的定理。在具有软限制的非线性连续控制任务上证明了所提出算法的功效。所提出的方法在实践中也可以改善最初的演示者,并比流行的增强学习基线获得更好的稳定性。
With a growing interest in data-driven control techniques, Model Predictive Control (MPC) provides an opportunity to exploit the surplus of data reliably, particularly while taking safety and stability into account. In many real-world and industrial applications, it is typical to have an existing control strategy, for instance, execution from a human operator. The objective of this work is to improve upon this unknown, safe but suboptimal policy by learning a new controller that retains safety and stability. Learning how to be safe is achieved directly from data and from a knowledge of the system constraints. The proposed algorithm alternatively learns the terminal cost and updates the MPC parameters according to a stability metric. The terminal cost is constructed as a Lyapunov function neural network with the aim of recovering or extending the stable region of the initial demonstrator using a short prediction horizon. Theorems that characterize the stability and performance of the learned MPC in the bearing of model uncertainties and sub-optimality due to function approximation are presented. The efficacy of the proposed algorithm is demonstrated on non-linear continuous control tasks with soft constraints. The proposed approach can improve upon the initial demonstrator also in practice and achieve better stability than popular reinforcement learning baselines.