通过正式验证的探索神经质质增强学习

论文标题

通过正式验证的探索神经质质增强学习

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

论文作者

Anderson, Greg, Verma, Abhinav, Dillig, Isil, Chaudhuri, Swarat

论文摘要

我们介绍了Revel，这是一个部分神经增强学习（RL）框架，可在连续状态和行动空间中被证明是安全的探索。可证明安全的深度RL的主要挑战是，在学习循环中反复验证神经网络在计算上是不可行的。我们使用两个策略类别解决这一挑战：一般的神经符号类，具有近似梯度和更受限制的符号策略，允许有效验证。我们的学习算法是对策略的镜子下降：在每次迭代中，它可以安全地将符号策略提升到神经成像空间中，对结果策略执行安全的梯度更新，并将更新的策略投射到安全的符号子集中，而无需明确的神经网络验证神经网络。我们的经验结果表明，在许多情况下，Revel都在限制政策优化的许多情况下执行安全探索，并且可以发现政策优于那些通过先前的方法来验证探索的方法。

We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题