通过轨迹的多项选择学习在增强学习中的动态概括

论文标题

通过轨迹的多项选择学习在增强学习中的动态概括

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

论文作者

Seo, Younggyo, Lee, Kimin, Clavera, Ignasi, Kurutach, Thanard, Shin, Jinwoo, Abbeel, Pieter

论文摘要

基于模型的增强学习（RL）在样本效率和最终性能方面在各种控制任务中表现出了巨大的潜力。但是，学习动态变化的可推广动力学模型仍然是一个挑战，因为目标过渡动力学遵循多模式分布。在本文中，我们提出了一种新的基于模型的RL算法，即创建的轨迹多种选择学习，该学习学习了动态概括的多头动力学模型。主要想法是更新最准确的预测头，以在具有相似动态的某些环境（即聚类环境）中专门使用每个头部。此外，我们结合了上下文学习，该学习将过去经验的特定于动态信息编码到上下文中，从而使模型能够在线适应未看到的环境。最后，为了更有效地利用专门的预测头，我们提出了一种自适应计划方法，该方法选择了最准确的预测头，而不是最近的经验。与最先进的RL方法相比，我们的方法在各种控制任务中表现出优异的零击球性能。源代码和视频可从https://sites.google.com/view/traightory-mcl获得。

Model-based reinforcement learning (RL) has shown great potential in various control tasks in terms of both sample-efficiency and final performance. However, learning a generalizable dynamics model robust to changes in dynamics remains a challenge since the target transition dynamics follow a multi-modal distribution. In this paper, we present a new model-based RL algorithm, coined trajectory-wise multiple choice learning, that learns a multi-headed dynamics model for dynamics generalization. The main idea is updating the most accurate prediction head to specialize each head in certain environments with similar dynamics, i.e., clustering environments. Moreover, we incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector, enabling the model to perform online adaptation to unseen environments. Finally, to utilize the specialized prediction heads more effectively, we propose an adaptive planning method, which selects the most accurate prediction head over a recent experience. Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods. Source code and videos are available at https://sites.google.com/view/trajectory-mcl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题