Realobot深入强化学习：通过参考校正改进灵活接头操纵器的轨迹跟踪

论文标题

Realobot深入强化学习：通过参考校正改进灵活接头操纵器的轨迹跟踪

Real-Robot Deep Reinforcement Learning: Improving Trajectory Tracking of Flexible-Joint Manipulator with Reference Correction

论文作者

Pavlichenko, Dmytro, Behnke, Sven

论文摘要

灵活的接头操纵器由复杂的非线性动力学控制，定义了一个具有挑战性的控制问题。在这项工作中，我们提出了一种通过深入增强学习来学习外环关节轨迹跟踪控制器的方法。由随机政策代表的控制器直接在两个小时内直接在真实的机器人上学习。这是通过有限的参考校正动作和使用无模型的非政策学习方法来实现的。此外，还提出了一个知情的政策初始化，该代理在学习的模拟中进行了预培训。我们在百特机器人的7 DOF操纵器上测试方法。我们证明，当直接应用于真实机器人时，提出的方法能够在多个运行中进行一致的学习。我们的方法产生了一项政策，该策略可显着提高轨迹跟踪精度与供应商提供的控制器相比，从而推广到看不见的有效载荷。

Flexible-joint manipulators are governed by complex nonlinear dynamics, defining a challenging control problem. In this work, we propose an approach to learn an outer-loop joint trajectory tracking controller with deep reinforcement learning. The controller represented by a stochastic policy is learned in under two hours directly on the real robot. This is achieved through bounded reference correction actions and use of a model-free off-policy learning method. In addition, an informed policy initialization is proposed, where the agent is pre-trained in a learned simulation. We test our approach on the 7 DOF manipulator of a Baxter robot. We demonstrate that the proposed method is capable of consistent learning across multiple runs when applied directly on the real robot. Our method yields a policy which significantly improves the trajectory tracking accuracy in comparison to the vendor-provided controller, generalizing to an unseen payload.

下载PDF全文

下载文献需遵守相关版权规定

论文标题