论文标题
在安全至关重要的环境中谨慎适应强化学习
Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
论文作者
论文摘要
在现实世界中,城市驾驶(例如城市驾驶)中的强化学习(RL)是危险的,使RL代理,其他代理商和环境造成了损害。为了克服这一困难,我们提出了“安全至关重要的适应”任务设置:代理首先在非安全“源”环境(例如模拟器中)进行训练,然后才适应失败带来沉重成本的目标环境。我们提出了一种解决方案方法,即Carl,该方法基于直觉,即先前在不同环境中的经验使代理人估算风险,这反过来又可以通过规避风险,谨慎的适应来实现相对安全。 Carl首先采用基于模型的RL来训练概率模型,以捕获各种源环境中有关过渡动态和灾难性状态的不确定性。然后,当Carl Agent探索具有未知动态的新的关键安全环境时,计划避免行动可能导致灾难性状态。在有关汽车驾驶,Cartpole平衡,半cheetah运动和机器人对象操纵的实验中,卡尔成功地获得了谨慎的探索行为,比强大的RL适应基线获得了更高的奖励,而失败的奖励更高。网站https://sites.google.com/berkeley.edu/carl。
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous, imperiling the RL agent, other agents, and the environment. To overcome this difficulty, we propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments such as in a simulator, before it adapts to the target environment where failures carry heavy costs. We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk, which in turn enables relative safety through risk-averse, cautious adaptation. CARL first employs model-based RL to train a probabilistic model to capture uncertainty about transition dynamics and catastrophic states across varied source environments. Then, when exploring a new safety-critical environment with unknown dynamics, the CARL agent plans to avoid actions that could lead to catastrophic states. In experiments on car driving, cartpole balancing, half-cheetah locomotion, and robotic object manipulation, CARL successfully acquires cautious exploration behaviors, yielding higher rewards with fewer failures than strong RL adaptation baselines. Website at https://sites.google.com/berkeley.edu/carl.