偶然受限的轨迹优化，用于安全探索和学习非线性系统

论文标题

偶然受限的轨迹优化，用于安全探索和学习非线性系统

Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems

论文作者

Nakka, Yashwanth Kumar, Liu, Anqi, Shi, Guanya, Anandkumar, Anima, Yue, Yisong, Chung, Soon-Jo

论文摘要

基于学习的控制算法需要大量的培训监督收集数据。安全探索算法即使只有部分知识，该数据收集过程的安全性也是如此。我们提出了一种新的方法，可以通过安全探索进行最佳运动计划，该方法将偶然受限的随机最佳控制与动态学习和反馈控制整合在一起。我们得出了一种迭代凸优化算法，该算法求解了\下列{info} rmation-cost \ usewastline {s} tochastic \ tochastic \ usepline \ usewinline {n} onlinear \ onlineAr \ linelear \ lisewissline {o} ptimal \ loseLine \ uneseLline {c} introl ottrol asntrol问题（infrol asnoc）。优化目标编码了学习绩效和学习成本的控制成本，并且安全性在分配稳健的机会限制中被纳入。动力学是从从数据中学到的强大回归模型中预测的。 Info-SNOC算法用于计算一个在安全限制下学习未知残留动态的探索的次级安全运动计划池。稳定的反馈控制器用于执行运动计划并收集用于模型学习的数据。我们证明了从探索方法中推出的安全性，并减少了对时期的不确定性，从而保证了我们的学习方法的一致性。我们通过为平面机器人设计和实施安全轨迹来验证Info-SNOC的有效性。我们证明，与确定性轨迹优化方法相比，我们的方法在确保安全方面具有更高的成功率。

Learning-based control algorithms require data collection with abundant supervision for training. Safe exploration algorithms ensure the safety of this data collection process even when only partial knowledge is available. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained stochastic optimal control with dynamics learning and feedback control. We derive an iterative convex optimization algorithm that solves an \underline{Info}rmation-cost \underline{S}tochastic \underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem (Info-SNOC). The optimization objective encodes control cost for performance and exploration cost for learning, and the safety is incorporated as distributionally robust chance constraints. The dynamics are predicted from a robust regression model that is learned from data. The Info-SNOC algorithm is used to compute a sub-optimal pool of safe motion plans that aid in exploration for learning unknown residual dynamics under safety constraints. A stable feedback controller is used to execute the motion plan and collect data for model learning. We prove the safety of rollout from our exploration method and reduction in uncertainty over epochs, thereby guaranteeing the consistency of our learning method. We validate the effectiveness of Info-SNOC by designing and implementing a pool of safe trajectories for a planar robot. We demonstrate that our approach has higher success rate in ensuring safety when compared to a deterministic trajectory optimization approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题