论文标题
用于概率可达性和安全规范的安全加强学习:一种基于Lyapunov的方法
Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach
论文作者
论文摘要
机器人技术和自主系统(例如自动驾驶和机器人手术)中的新兴应用通常涉及关键的安全限制,即使有关系统模型的信息有限,也必须满足这些限制。在这方面,我们提出了一种无模型的安全规范方法,该方法通过仔细组合概率可达性分析和安全增强学习(RL)来了解安全操作的最大概率。我们的方法在安全政策方面构建了Lyapunov功能,以限制每个政策改进阶段。结果,它产生了一系列安全策略的序列,这些策略决定了安全操作的范围,称为安全集,单调扩展并逐渐收敛。我们还制定了一种有效的安全勘探计划,该计划加速了识别未经审查状态的安全性的过程。利用Lyapunov屏蔽,我们的方法调节了探索性政策,以避开危险的国家充满信心。为了处理高维系统,我们通过引入Lagrangian松弛技术来建立可拖延的参与者批评算法,进一步扩展了对Deep RL的方法。通过连续控制基准问题,例如在平面机器人部门的达到任务,可以证明我们方法的经验性能。
Emerging applications in robotics and autonomous systems, such as autonomous driving and robotic surgery, often involve critical safety constraints that must be satisfied even when information about system models is limited. In this regard, we propose a model-free safety specification method that learns the maximal probability of safe operation by carefully combining probabilistic reachability analysis and safe reinforcement learning (RL). Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage. As a result, it yields a sequence of safe policies that determine the range of safe operation, called the safe set, which monotonically expands and gradually converges. We also develop an efficient safe exploration scheme that accelerates the process of identifying the safety of unexamined states. Exploiting the Lyapunov shielding, our method regulates the exploratory policy to avoid dangerous states with high confidence. To handle high-dimensional systems, we further extend our approach to deep RL by introducing a Lagrangian relaxation technique to establish a tractable actor-critic algorithm. The empirical performance of our method is demonstrated through continuous control benchmark problems, such as a reaching task on a planar robot arm.