论文标题

深度强化学习的首要偏见

The Primacy Bias in Deep Reinforcement Learning

论文作者

Nikishin, Evgenii, Schwarzer, Max, D'Oro, Pierluca, Bacon, Pierre-Luc, Courville, Aaron

论文摘要

这项工作确定了深度加固学习(RL)算法的常见缺陷:倾向于依靠早期相互作用并忽略后来遇到的有用证据。由于对逐步增长数据集的培训,深度RL代理会产生过度适应早期体验的风险,从而对其余的学习过程产生负面影响。受认知科学的启发,我们将这种效果称为首要偏见。通过一系列实验,我们剖析了加剧这种偏见的深度RL的算法方面。然后,我们提出了一种简单但通常具有易于应用的机制,该机制通过定期重置一部分代理来解决首要偏差。我们将此机制应用于离散(ATARI 100K)和连续作用(DeepMind Control Suite)域中的算法,从而不断提高其性能。

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源