对机器人操纵技能的上下文潜在运动范围优化

论文标题

对机器人操纵技能的上下文潜在运动范围优化

Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

论文作者

Tosatto, Samuele, Chalvatzaki, Georgia, Peters, Jan

论文摘要

参数化的运动原语被广泛用于模仿机器人任务的学习。但是，参数空间的高维度妨碍了增强学习（RL）设置中此类原语的改善，尤其是用于使用物理机器人学习的方法。在本文中，我们提出了一种新的观点，以使用概率主体成分分析仪（MPPCA）对运动参数空间的混合物进行处理，以处理以获取低维，非线性潜在动力学的轨迹。此外，我们引入了一种新的上下文非政策RL算法，称为潜在运动策略优化（LAMPO）。 Lampo可以使用自称的重要性采样从以前的经验中提供梯度估计，因此，充分利用了以前的学习迭代中收集的样本。这些优点结合在一起为机器人学习高维操纵技巧的机器人学习的样本效率外运动原料提供了完整的框架。我们的实验结果既是在仿真和一个真实的机器人中进行的，表明Lampo提供了针对文献中常见方法的样本效率策略。

Parameterized movement primitives have been extensively used for imitation learning of robotic tasks. However, the high-dimensionality of the parameter space hinders the improvement of such primitives in the reinforcement learning (RL) setting, especially for learning with physical robots. In this paper we propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics, using mixtures of probabilistic principal component analyzers (MPPCA) on the movements' parameter space. Moreover, we introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO). LAMPO can provide gradient estimates from previous experience using self-normalized importance sampling, hence, making full use of samples collected in previous learning iterations. These advantages combined provide a complete framework for sample-efficient off-policy optimization of movement primitives for robot learning of high-dimensional manipulation skills. Our experimental results conducted both in simulation and on a real robot show that LAMPO provides sample-efficient policies against common approaches in literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题