RL插头：一个用于离线加固学习的基准套件

论文标题

RL插头：一个用于离线加固学习的基准套件

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

论文作者

Gulcehre, Caglar, Wang, Ziyu, Novikov, Alexander, Paine, Tom Le, Colmenarejo, Sergio Gomez, Zolna, Konrad, Agarwal, Rishabh, Merel, Josh, Mankowitz, Daniel, Paduraru, Cosmin, Dulac-Arnold, Gabriel, Li, Jerry, Norouzi, Mohammad, Hoffman, Matt, Nachum, Ofir, Tucker, George, Heess, Nicolas, de Freitas, Nando

论文摘要

加强学习的离线方法有可能帮助弥合强化学习研究和现实世界应用之间的差距。它们使从离线数据集中学习政策成为可能，从而克服了与现实世界中在线数据收集有关的问题，包括成本，安全或道德问题。在本文中，我们提出了一个名为RL Unplugged的基准，以评估和比较离线RL方法。 RL未插电包括来自各种域名的数据，包括游戏（例如Atari Benchmark）和模拟电动机控制问题（例如DM Control Suite）。数据集包括部分或完全可观察到的域，使用连续或离散的动作，并具有随机性与确定性动态。我们建议使用这些协议对RL Unplugged的每个域提出详细的评估协议，并对监督学习和离线RL方法进行了广泛的分析。我们将发布有关我们所有任务的数据，并在本文中介绍的所有算法。我们希望我们的基准套件能够提高实验的可重复性，并使以有限的计算预算研究具有挑战性的任务成为可能，从而使RL研究更加系统性，并且在整个社区中更容易访问。向前迈进，我们将RL插头视为一个活着的基准套件，它将随着研究界和我们自己的贡献而发展和成长。我们的项目页面可在https://git.io/jjuhd上找到。

Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games (e.g., Atari benchmark) and simulated motor control problems (e.g., DM Control Suite). The datasets include domains that are partially or fully observable, use continuous or discrete actions, and have stochastic vs. deterministic dynamics. We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols. We will release data for all our tasks and open-source all algorithms presented in this paper. We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community. Moving forward, we view RL Unplugged as a living benchmark suite that will evolve and grow with datasets contributed by the research community and ourselves. Our project page is available on https://git.io/JJUhd.

下载PDF全文

下载文献需遵守相关版权规定

论文标题