论文标题
深度强化学习的首要偏见
The Primacy Bias in Deep Reinforcement Learning
论文作者
论文摘要
这项工作确定了深度加固学习(RL)算法的常见缺陷:倾向于依靠早期相互作用并忽略后来遇到的有用证据。由于对逐步增长数据集的培训,深度RL代理会产生过度适应早期体验的风险,从而对其余的学习过程产生负面影响。受认知科学的启发,我们将这种效果称为首要偏见。通过一系列实验,我们剖析了加剧这种偏见的深度RL的算法方面。然后,我们提出了一种简单但通常具有易于应用的机制,该机制通过定期重置一部分代理来解决首要偏差。我们将此机制应用于离散(ATARI 100K)和连续作用(DeepMind Control Suite)域中的算法,从而不断提高其性能。
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.