重新加固学习：重复先前的计算以加速进度

论文标题

重新加固学习：重复先前的计算以加速进度

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

论文作者

Agarwal, Rishabh, Schwarzer, Max, Castro, Pablo Samuel, Courville, Aaron, Bellemare, Marc G.

论文摘要

学习Tabula Rasa，即没有任何先验知识，是增强学习（RL）研究的普遍工作流程。但是，当RL系统应用于大型设置时，很少操作Tabula Rasa。这样的大规模系统在开发周期内经历了多种设计或算法变化，并使用临时方法将这些变化纳入而不从头开始重新训练，这将是非常昂贵的。此外，Deep RL的效率低下通常将研究人员排除在没有工业规模的资源的情况下，而无法解决计算要求问题。为了解决这些问题，我们将RL作为替代工作流程或问题设置类别表示，在此设置中，在RL代理的设计迭代之间或从一个RL代理到另一个RL代理之间，将先前的计算工作（例如，学习的策略）重复使用或转移。作为使RL从任何代理到任何其他代理的转世的一步，我们专注于有效地将现有的子最佳策略转移到基于独立的RL代理的特定设置。我们发现现有方法在这种情况下失败，并提出了一种简单的算法来解决其局限性。配备了该算法，我们展示了RL在Atari 2600游戏中的Tabula Rasa RL的重新分配，这是一项具有挑战性的运动任务，以及导航平流层气球的现实问题。总体而言，这项工作为RL研究提供了另一种方法，我们认为这可以显着改善现实世界中的RL采用并有助于进一步民主化。 https://agarwl.github.io/reincarnating_rl的开源代码和经过训练的代理。

Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent. We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further. Open-sourced code and trained agents at https://agarwl.github.io/reincarnating_rl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题