通过最大化表示多样性来防止集合{q}中的价值函数崩溃 -

论文标题

通过最大化表示多样性来防止集合{q}中的价值函数崩溃 -

Preventing Value Function Collapse in Ensemble {Q}-Learning by Maximizing Representation Diversity

论文作者

Sheikh, Hassam Ullah, Bölöni, Ladislau

论文摘要

经典的DQN算法受到学习Q功能的高估偏差的限制。随后的算法提出了拟议的技术来减少此问题，而无需完全消除它。最近，Maxmin和Ensemble Q学习算法使用了学习者集合提供的不同估计值来减少高估偏差。不幸的是，这些学习者可以收敛到参数或表示空间中的同一点，落到经典的单个神经网络DQN。在本文中，我们描述了一种正则化技术，以最大化这些算法中的集合多样性。我们提出并比较了经济学理论和共识优化启发的五个正则功能。我们表明，正则方法显着优于Maxmin和集合Q学习算法以及非汇编基准。

The classic DQN algorithm is limited by the overestimation bias of the learned Q-function. Subsequent algorithms have proposed techniques to reduce this problem, without fully eliminating it. Recently, the Maxmin and Ensemble Q-learning algorithms have used different estimates provided by the ensembles of learners to reduce the overestimation bias. Unfortunately, these learners can converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN. In this paper, we describe a regularization technique to maximize ensemble diversity in these algorithms. We propose and compare five regularization functions inspired from economics theory and consensus optimization. We show that the regularized approach significantly outperforms the Maxmin and Ensemble Q-learning algorithms as well as non-ensemble baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题