论文标题
深度学习Q-Networks
Cross Learning in Deep Q-Networks
论文作者
论文摘要
在这项工作中,我们提出了一种新颖的交叉Q学习算法,旨在减轻基于价值的增强学习方法中众所周知的高估问题,尤其是在深度Q-Networks中,由于功能近似错误夸大了高估。我们的算法通过维护一组并行模型并基于随机选择的网络估算Q值,从而构建了双Q学习,从而导致降低高估偏差以及方差。我们通过评估某些基准环境来提供有关方法的优势的经验证据,实验结果表明,在减少高估偏见和稳定训练方面的性能显着改善,进一步导致了更好的派生政策。
In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep Q-networks where the overestimation is exaggerated by function approximation errors. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network, which leads to reduced overestimation bias as well as the variance. We provide empirical evidence on the advantages of our method by evaluating on some benchmark environment, the experimental results demonstrate significant improvement of performance in reducing the overestimation bias and stabilizing the training, further leading to better derived policies.