深度强化学习的首要偏见

论文标题

深度强化学习的首要偏见

The Primacy Bias in Deep Reinforcement Learning

论文作者

Nikishin, Evgenii, Schwarzer, Max, D'Oro, Pierluca, Bacon, Pierre-Luc, Courville, Aaron

论文摘要

这项工作确定了深度加固学习（RL）算法的常见缺陷：倾向于依靠早期相互作用并忽略后来遇到的有用证据。由于对逐步增长数据集的培训，深度RL代理会产生过度适应早期体验的风险，从而对其余的学习过程产生负面影响。受认知科学的启发，我们将这种效果称为首要偏见。通过一系列实验，我们剖析了加剧这种偏见的深度RL的算法方面。然后，我们提出了一种简单但通常具有易于应用的机制，该机制通过定期重置一部分代理来解决首要偏差。我们将此机制应用于离散（ATARI 100K）和连续作用（DeepMind Control Suite）域中的算法，从而不断提高其性能。

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题