论文标题

使用Davenport链式旋转的手工机器人操作的分层增强学习

Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations

论文作者

Sanchez, Francisco Roldan, Wang, Qiang, Bulens, David Cordova, McGuinness, Kevin, Redmond, Stephen, O'Connor, Noel

论文摘要

端到端强化学习技术是机器人操纵任务最成功的方法之一。但是,找到能够解决复杂任务的良好政策所需的培训时间非常大。因此,根据可用的计算资源,使用此类技术可能是不可行的。将域知识用于将操纵任务分解为原始技能,可以按顺序进行,可以降低学习问题的总体复杂性,从而减少实现敏捷所需的培训量。在本文中,我们建议使用Davenport链式旋转将复杂的3D旋转目标分解为一组更简单的旋转技能的串联。然后,可以使用较少的整体模拟体验对基于最新的加强学习方法进行培训。我们将其性能与流行的事后经验重播方法进行了比较,该方法以端到端的方式训练了使用模拟机器人手动环境中相同数量的经验。尽管在依次执行原始技能的性能总体上的表现总体下降,但我们发现,当计算资源受到限制时,将任意3D旋转转换为基本轮换是有益的,从而获得了最复杂的3D旋转的成功率,而在端到端到的时尚率上获得了最复杂的3D旋转,并增加了20%和40%的成功率。

End-to-end reinforcement learning techniques are among the most successful methods for robotic manipulation tasks. However, the training time required to find a good policy capable of solving complex tasks is prohibitively large. Therefore, depending on the computing resources available, it might not be feasible to use such techniques. The use of domain knowledge to decompose manipulation tasks into primitive skills, to be performed in sequence, could reduce the overall complexity of the learning problem, and hence reduce the amount of training required to achieve dexterity. In this paper, we propose the use of Davenport chained rotations to decompose complex 3D rotation goals into a concatenation of a smaller set of more simple rotation skills. State-of-the-art reinforcement-learning-based methods can then be trained using less overall simulated experience. We compare its performance with the popular Hindsight Experience Replay method, trained in an end-to-end fashion using the same amount of experience in a simulated robotic hand environment. Despite a general decrease in performance of the primitive skills when being sequentially executed, we find that decomposing arbitrary 3D rotations into elementary rotations is beneficial when computing resources are limited, obtaining increases of success rates of approximately 10% on the most complex 3D rotations with respect to the success rates obtained by HER trained in an end-to-end fashion, and increases of success rates between 20% and 40% on the most simple rotations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源