使用适当的基准评估Atari深入增强学习中的数据效率的重要性

论文标题

使用适当的基准评估Atari深入增强学习中的数据效率的重要性

Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari

论文作者

Kielak, Kacper

论文摘要

在过去的几年中，强化学习（RL）的进步很大。然而，RL社区之间的共识是，尽管目前使用了所有好处，但目前使用了极端数据效率低下的方法，尤其是在Atari等丰富的视觉领域中。为了解决这个问题，引入了新的方法，这种方法通常比最先进的DQN算法的流行变体更有效。但是，在本文中，我们证明了新提出的技术在实验中只是使用了不公平的基线。也就是说，我们表明效率的实际提高来自允许对每个数据样本进行更多培训更新的算法，而不是使用新方法。通过允许DQN更频繁地执行网络更新，我们设法达到了比最近提出的进步相似或更好的结果，通常是以复杂性和计算成本的一小部分。此外，根据研究的结果，我们认为与本文中介绍的经过修改的DQN相似的药物应作为任何旨在提高深度强化学习样本效率的未来工作的基线。

Reinforcement learning (RL) has seen great advancements in the past few years. Nevertheless, the consensus among the RL community is that currently used methods, despite all their benefits, suffer from extreme data inefficiency, especially in the rich visual domains like Atari. To circumvent this problem, novel approaches were introduced that often claim to be much more efficient than popular variations of the state-of-the-art DQN algorithm. In this paper, however, we demonstrate that the newly proposed techniques simply used unfair baselines in their experiments. Namely, we show that the actual improvement in the efficiency came from allowing the algorithm for more training updates for each data sample, and not from employing the new methods. By allowing DQN to execute network updates more frequently we manage to reach similar or better results than the recently proposed advancement, often at a fraction of complexity and computational costs. Furthermore, based on the outcomes of the study, we argue that the agent similar to the modified DQN that is presented in this paper should be used as a baseline for any future work aimed at improving sample efficiency of deep reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题