MELD：通过潜在状态模型从图像中学习的元提升。

论文标题

MELD：通过潜在状态模型从图像中学习的元提升。

MELD: Meta-Reinforcement Learning from Images via Latent State Models

论文作者

Zhao, Tony Z., Nagabandi, Anusha, Rakelly, Kate, Finn, Chelsea, Levine, Sergey

论文摘要

元强化学习算法可以使自主代理（例如机器人）能够通过在一系列相关培训任务中利用先前的经验来快速获得新的行为。但是，元训练的繁重数据要求与从图像之类的感觉输入（例如图像）学习的挑战更加复杂，这使得元rl充满挑战，可以应用于真实的机器人系统。潜在状态模型从一系列观测中学习紧凑状态表示，可以从视觉输入中加速表示。在本文中，我们利用元学习作为任务推断的观点，表明潜在状态模型可以\ emph {也可以在适当定义的观察空间中执行元学习。在此洞察力的基础上，我们开发了具有潜在动力学（MELD）的元RL，这是一种来自在潜在状态模型中执行推理的元素算法，以便在观察和奖励的情况下快速获取新技能。在几个模拟的基于图像的机器人控制问题上，梅尔德的表现要优于先前的meta-rl方法，并使真实的寡妇机器人臂能够将以太网电缆插入新位置，因为只有$ 8 $ $小时的现实世界元training，就有稀疏的任务完成信号。据我们所知，MELD是第一个在图像的实际机器人控制设置中训练的元元素算法。

Meta-reinforcement learning algorithms can enable autonomous agents, such as robots, to quickly acquire new behaviors by leveraging prior experience in a set of related training tasks. However, the onerous data requirements of meta-training compounded with the challenge of learning from sensory inputs such as images have made meta-RL challenging to apply to real robotic systems. Latent state models, which learn compact state representations from a sequence of observations, can accelerate representation learning from visual inputs. In this paper, we leverage the perspective of meta-learning as task inference to show that latent state models can \emph{also} perform meta-learning given an appropriately defined observation space. Building on this insight, we develop meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills given observations and rewards. MELD outperforms prior meta-RL methods on several simulated image-based robotic control problems, and enables a real WidowX robotic arm to insert an Ethernet cable into new locations given a sparse task completion signal after only $8$ hours of real world meta-training. To our knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题