乘法控制器融合：利用算法先进的算法，以进行样品有效的增强学习和安全的SIM到现实转移

论文标题

乘法控制器融合：利用算法先进的算法，以进行样品有效的增强学习和安全的SIM到现实转移

Multiplicative Controller Fusion: Leveraging Algorithmic Priors for Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer

论文作者

Rana, Krishan, Dasagi, Vibhavari, Talbot, Ben, Milford, Michael, Sünderhauf, Niko

论文摘要

基于学习的方法通常优于手工编码的算法解决方案，用于机器人技术中的许多问题。但是，在真正的机器人硬件上学习长摩根任务可能是棘手的，将学习的政策从模拟转移到现实仍然是极具挑战性的。我们提出了一种新颖的方法，可以在培训和部署期间将现有的次级解决方案作为算法的先验方法来利用现有的次级解决方案。在训练过程中，我们的封闭式融合方法可以在指导探索的初始阶段，增加样本效率并从稀疏的长摩恩奖励信号中学习。重要的是，由于先前的影响力逐渐被退火，因此该政策可以学会改善超级优先的表现。在部署期间，政策的不确定性为将模拟训练的政策转移到现实世界中提供了可靠的策略，通过回到不确定状态的先前控制器。我们显示了我们的乘法控制器融合方法对机器人导航任务的功效，并证明了从模拟到现实世界的安全转移而没有任何微调。该项目的代码可在https://sites.google.com/view/mcf-nav/home上公开获得。

Learning-based approaches often outperform hand-coded algorithmic solutions for many problems in robotics. However, learning long-horizon tasks on real robot hardware can be intractable, and transferring a learned policy from simulation to reality is still extremely challenging. We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions as an algorithmic prior during training and deployment. During training, our gated fusion approach enables the prior to guide the initial stages of exploration, increasing sample-efficiency and enabling learning from sparse long-horizon reward signals. Importantly, the policy can learn to improve beyond the performance of the sub-optimal prior since the prior's influence is annealed gradually. During deployment, the policy's uncertainty provides a reliable strategy for transferring a simulation-trained policy to the real world by falling back to the prior controller in uncertain states. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation and demonstrate safe transfer from simulation to the real world without any fine-tuning. The code for this project is made publicly available at https://sites.google.com/view/mcf-nav/home

下载PDF全文

下载文献需遵守相关版权规定

论文标题