在安全至关重要的环境中谨慎适应强化学习

论文标题

在安全至关重要的环境中谨慎适应强化学习

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

论文作者

Zhang, Jesse, Cheung, Brian, Finn, Chelsea, Levine, Sergey, Jayaraman, Dinesh

论文摘要

在现实世界中，城市驾驶（例如城市驾驶）中的强化学习（RL）是危险的，使RL代理，其他代理商和环境造成了损害。为了克服这一困难，我们提出了“安全至关重要的适应”任务设置：代理首先在非安全“源”环境（例如模拟器中）进行训练，然后才适应失败带来沉重成本的目标环境。我们提出了一种解决方案方法，即Carl，该方法基于直觉，即先前在不同环境中的经验使代理人估算风险，这反过来又可以通过规避风险，谨慎的适应来实现相对安全。 Carl首先采用基于模型的RL来训练概率模型，以捕获各种源环境中有关过渡动态和灾难性状态的不确定性。然后，当Carl Agent探索具有未知动态的新的关键安全环境时，计划避免行动可能导致灾难性状态。在有关汽车驾驶，Cartpole平衡，半cheetah运动和机器人对象操纵的实验中，卡尔成功地获得了谨慎的探索行为，比强大的RL适应基线获得了更高的奖励，而失败的奖励更高。网站https://sites.google.com/berkeley.edu/carl。

Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous, imperiling the RL agent, other agents, and the environment. To overcome this difficulty, we propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments such as in a simulator, before it adapts to the target environment where failures carry heavy costs. We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk, which in turn enables relative safety through risk-averse, cautious adaptation. CARL first employs model-based RL to train a probabilistic model to capture uncertainty about transition dynamics and catastrophic states across varied source environments. Then, when exploring a new safety-critical environment with unknown dynamics, the CARL agent plans to avoid actions that could lead to catastrophic states. In experiments on car driving, cartpole balancing, half-cheetah locomotion, and robotic object manipulation, CARL successfully acquires cautious exploration behaviors, yielding higher rewards with fewer failures than strong RL adaptation baselines. Website at https://sites.google.com/berkeley.edu/carl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题