左轮手枪：机器人到机器人政策转移的连续进化模型

论文标题

左轮手枪：机器人到机器人政策转移的连续进化模型

REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer

论文作者

Liu, Xingyu, Pathak, Deepak, Kitani, Kris M.

论文摘要

机器人学习中流行的范式是为每个新机器人从头开始训练政策。这不仅效率低下，而且对于复杂的机器人而言通常是不切实际的。在这项工作中，我们考虑了将政策转移到具有显着不同参数（例如运动学和形态）的两个不同机器人中的问题。通过与动作或状态过渡分布相匹配（包括模仿学习方法）来训练新政策的现有方法，这是由于最佳动作和/或状态分布在不同机器人中不匹配而失败的。在本文中，我们提出了一种名为$ Revolver $的新方法，该方法使用连续进化模型用于物理模拟器中实现的机器人政策转移。我们通过找到机器人参数的连续进化变化，在源机器人和目标机器人之间进行了插值。源机器人的专家政策是通过一系列中间机器人培训来转移的，这些机器人逐渐发展为目标机器人。物理模拟器上的实验表明，所提出的连续进化模型可以有效地跨机器人转移策略，并在新机器人上实现出色的样品效率。在稀疏的奖励环境中，提出的方法尤其有利，在稀疏奖励环境中，探索可以大大减少。代码在https://github.com/xingyul/revolver上发布。

A popular paradigm in robotic learning is to train a policy from scratch for every new robot. This is not only inefficient but also often impractical for complex robots. In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. In this paper, we propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator. We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters. An expert policy on the source robot is transferred through training on a sequence of intermediate robots that gradually evolve into the target robot. Experiments on a physics simulator show that the proposed continuous evolutionary model can effectively transfer the policy across robots and achieve superior sample efficiency on new robots. The proposed method is especially advantageous in sparse reward settings where exploration can be significantly reduced. Code is released at https://github.com/xingyul/revolver.

下载PDF全文

下载文献需遵守相关版权规定

论文标题