世界模型在不断加强学习中的有效性

论文标题

世界模型在不断加强学习中的有效性

The Effectiveness of World Models for Continual Reinforcement Learning

论文作者

Kessler, Samuel, Ostaszewski, Mateusz, Bortkiewicz, Michał, Żarski, Mateusz, Wołczyk, Maciej, Parker-Holder, Jack, Roberts, Stephen J., Miłoś, Piotr

论文摘要

世界模型为一些最有效的增强学习算法提供动力。在这项工作中，我们展示了它们可以通过不断学习的方式来利用它们 - 代理面临不断变化的环境的情况。世界模型通常采用重播缓冲液进行培训，可以自然地扩展到持续学习。我们系统地研究不同的选择性体验重播方法如何影响性能，忘记和转移。我们还提供有关使用世界模型的各种建模选项的建议。最好的选择称为连续梦想者，它是任务不合时宜的，并利用世界模型进行连续探索。持续的梦想者是样本有效的，并且在Minigrid和Minihack基准方面胜过最先进的任务无关紧要的持续强化学习方法。

World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题