论文标题
两个不同的拟人化机器人手臂之间基于公制的模仿学习
Metric-Based Imitation Learning Between Two Dissimilar Anthropomorphic Robotic Arms
论文作者
论文摘要
可以从人类示范中学习的自主机器人系统的发展,以模仿所需的行为(而不是手动编程)具有巨大的技术潜力。模仿学习中的一个主要挑战是对应问题:当代理的实施方案不同时(形态,动力学,自由度等),如何建立专家和学习者之间的相应状态和行动。模仿学习的许多现有方法避免了对应问题,例如在机器人上执行的动力学教学或远程操作。在这项工作中,我们通过引入不同实施方案之间的距离度量来明确解决对应问题。然后,该度量被用作静态姿势模仿的损失函数,并用作无模型的深层增强学习框架中的反馈信号,以模拟两个拟人机器人手臂之间的动态运动模仿。我们发现该度量非常适合描述实施方案和通过距离最小化学习模仿政策之间的相似性。
The development of autonomous robotic systems that can learn from human demonstrations to imitate a desired behavior - rather than being manually programmed - has huge technological potential. One major challenge in imitation learning is the correspondence problem: how to establish corresponding states and actions between expert and learner, when the embodiments of the agents are different (morphology, dynamics, degrees of freedom, etc.). Many existing approaches in imitation learning circumvent the correspondence problem, for example, kinesthetic teaching or teleoperation, which are performed on the robot. In this work we explicitly address the correspondence problem by introducing a distance measure between dissimilar embodiments. This measure is then used as a loss function for static pose imitation and as a feedback signal within a model-free deep reinforcement learning framework for dynamic movement imitation between two anthropomorphic robotic arms in simulation. We find that the measure is well suited for describing the similarity between embodiments and for learning imitation policies by distance minimization.