通过模拟先验的元学习嵌入机器人技术快速在线改编

论文标题

通过模拟先验的元学习嵌入机器人技术快速在线改编

Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors

论文作者

Kaushik, Rituraj, Anne, Timothée, Mouret, Jean-Baptiste

论文摘要

元学习算法可以通过为动态模型找到一组初始参数来加速基于模型的增强算法（MBRL）算法，从而可以训练该模型以匹配系统的实际动力学，只有几个数据点。但是，在现实世界中，机器人可能会遇到任何情况，从汽车失败到在岩石地形上找到自己的情况，在岩石地形中，机器人的动态可能会显着不同。在本文中，首先，我们表明，当元训练情况（以前的情况）具有如此多样化的动态时，使用一组元训练的参数作为起点仍然需要从真实系统中进行大量观察以学习动力学的有用模型。其次，我们提出了一种称为FAME的算法，该算法通过元训练训练模型的几个初始起点（即初始参数）来减轻这种限制，并允许机器人选择最合适的起点，以将模型适应当前情况，仅使用几个梯度步骤。我们将FAME与MBRL，MBRL与MAML进行了元训练的模型，以及用于各种模拟和真正的机器人任务的无模型策略搜索算法PPO，并表明Famle允许机器人适应比基线更少的时间损害。

Meta-learning algorithms can accelerate the model-based reinforcement learning (MBRL) algorithms by finding an initial set of parameters for the dynamical model such that the model can be trained to match the actual dynamics of the system with only a few data-points. However, in the real world, a robot might encounter any situation starting from motor failures to finding itself in a rocky terrain where the dynamics of the robot can be significantly different from one another. In this paper, first, we show that when meta-training situations (the prior situations) have such diverse dynamics, using a single set of meta-trained parameters as a starting point still requires a large number of observations from the real system to learn a useful model of the dynamics. Second, we propose an algorithm called FAMLE that mitigates this limitation by meta-training several initial starting points (i.e., initial parameters) for training the model and allows the robot to select the most suitable starting point to adapt the model to the current situation with only a few gradient steps. We compare FAMLE to MBRL, MBRL with a meta-trained model with MAML, and model-free policy search algorithm PPO for various simulated and real robotic tasks, and show that FAMLE allows the robots to adapt to novel damages in significantly fewer time-steps than the baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题