通过用噪音替换先验，改善了自举的DQN的多样性

论文标题

通过用噪音替换先验，改善了自举的DQN的多样性

Improving the Diversity of Bootstrapped DQN by Replacing Priors With Noise

论文作者

Meng, Li, Goodwin, Morten, Yazidi, Anis, Engelstad, Paal

论文摘要

Q学习是最著名的增强学习算法之一。使用神经网络开发该算法已经做出了巨大的努力。在其中，引导性深Q学习网络就是其中之一。它利用多个神经网络头将多样性引入Q学习。有时可以将多样性视为代理商可以在给定状态下采取的合理移动量，类似于RL勘探比的定义。因此，引导深度Q学习网络的性能与算法内的多样性水平深厚相关。在最初的研究中，有人指出，随机的先验可以提高模型的性能。在本文中，我们进一步探讨了用噪声代替先验的可能性，并从高斯分布中采样噪声，以将更多的多样性引入该算法。我们对Atari基准测试进行实验，并将我们的算法与原始算法和其他相关算法进行比较。结果表明，我们对自举的深Q学习算法的修改可在不同类型的Atari游戏中获得更高的评估得分。因此，我们得出的结论是，用噪声代替先验可以通过确保多样性的完整性来改善自举的深度Q学习的性能。

Q-learning is one of the most well-known Reinforcement Learning algorithms. There have been tremendous efforts to develop this algorithm using neural networks. Bootstrapped Deep Q-Learning Network is amongst them. It utilizes multiple neural network heads to introduce diversity into Q-learning. Diversity can sometimes be viewed as the amount of reasonable moves an agent can take at a given state, analogous to the definition of the exploration ratio in RL. Thus, the performance of Bootstrapped Deep Q-Learning Network is deeply connected with the level of diversity within the algorithm. In the original research, it was pointed out that a random prior could improve the performance of the model. In this article, we further explore the possibility of replacing priors with noise and sample the noise from a Gaussian distribution to introduce more diversity into this algorithm. We conduct our experiment on the Atari benchmark and compare our algorithm to both the original and other related algorithms. The results show that our modification of the Bootstrapped Deep Q-Learning algorithm achieves significantly higher evaluation scores across different types of Atari games. Thus, we conclude that replacing priors with noise can improve Bootstrapped Deep Q-Learning's performance by ensuring the integrity of diversities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题