分配自适应元加强学习

论文标题

分配自适应元加强学习

Distributionally Adaptive Meta Reinforcement Learning

论文作者

Ajay, Anurag, Gupta, Abhishek, Ghosh, Dibya, Levine, Sergey, Agrawal, Pulkit

论文摘要

元强制学习算法提供了一种数据驱动的方式来获取策略，以迅速适应具有不同奖励或动态功能的许多任务。但是，学到的元过程通常只有在训练有素的确切任务分布中有效，并且在存在测试时间奖励或过渡动态的分布转移的情况下挣扎。在这项工作中，我们开发了一个元素算法的框架，该算法能够在任务空间的测试时间分配变化下表现适当。我们的框架集中于一种适应性鲁棒性的自适应方法，该方法训练了元元素群体，以稳健，以达到各种分配变化的水平。当对任务的潜在测试时间分布进行评估时，这使我们可以选择具有最合适级别的稳健性的元策略，并使用它来执行快速适应性。我们正式展示了我们的框架如何在分配变化下改善遗憾，并在广泛的分配变化下进行经验表明其对模拟机器人问题的功效。

Meta-reinforcement learning algorithms provide a data-driven way to acquire policies that quickly adapt to many tasks with varying rewards or dynamics functions. However, learned meta-policies are often effective only on the exact task distribution on which they were trained and struggle in the presence of distribution shift of test-time rewards or transition dynamics. In this work, we develop a framework for meta-RL algorithms that are able to behave appropriately under test-time distribution shifts in the space of tasks. Our framework centers on an adaptive approach to distributional robustness that trains a population of meta-policies to be robust to varying levels of distribution shift. When evaluated on a potentially shifted test-time distribution of tasks, this allows us to choose the meta-policy with the most appropriate level of robustness, and use it to perform fast adaptation. We formally show how our framework allows for improved regret under distribution shift, and empirically show its efficacy on simulated robotics problems under a wide range of distribution shifts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题