一致的辍学率梯度加强学习

论文标题

一致的辍学率梯度加强学习

Consistent Dropout for Policy Gradient Reinforcement Learning

论文作者

Hausknecht, Matthew, Wagener, Nolan

论文摘要

长期以来，辍学一直是监督学习的主食，但很少用于增强学习。我们分析了为什么Naive的辍学应用在策略梯度学习算法上是有问题的，并引入了一致的辍学，这是一种解决这种不稳定的简单技术。我们证明，在各种辍学概率的连续和离散的动作环境中，在连续和离散的动作环境中都可以通过A2C和PPO进行稳定训练。最后，我们表明一致的辍学能够在线培训复杂的体系结构，例如GPT，而无需禁用模型的本机辍学。

Dropout has long been a staple of supervised learning, but is rarely used in reinforcement learning. We analyze why naive application of dropout is problematic for policy-gradient learning algorithms and introduce consistent dropout, a simple technique to address this instability. We demonstrate consistent dropout enables stable training with A2C and PPO in both continuous and discrete action environments across a wide range of dropout probabilities. Finally, we show that consistent dropout enables the online training of complex architectures such as GPT without needing to disable the model's native dropout.

下载PDF全文

下载文献需遵守相关版权规定

论文标题