分配加固学习中的返回分配如何帮助优化？

论文标题

分配加固学习中的返回分配如何帮助优化？

How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?

论文作者

Sun, Ke, Jiang, Bei, Kong, Linglong

论文摘要

分销强化学习的重点是学习整个回报分布，而不仅仅是在标准RL中的期望，在提高性能方面取得了巨大的成功。尽管取得了这些进步，但我们对分布RL中的返回分配方式的理解仍然有限。在这项研究中，我们通过利用其在神经拟合的Z材料〜（Neural FZI）框架内利用其额外的回报分布知识来研究分布RL的优化优势。首先，我们证明了分布RL的分布损失具有理想的平滑性特征，因此具有稳定的梯度，这符合其促进优化稳定性的趋势。此外，分布RL的加速效应是通过分解返回分布来揭示的。它表明，如果合适的返回分布近似值，则分布RL可以表现出色，该分布通过每个环境中梯度估计的方差来衡量。与经典RL相比，严格的实验验证了分布RL的稳定优化行为及其加速度效应。我们的研究发现阐明了分布RL算法中的回报分布如何有助于优化。

Distributional reinforcement learning, which focuses on learning the entire return distribution instead of only its expectation in standard RL, has demonstrated remarkable success in enhancing performance. Despite these advancements, our comprehension of how the return distribution within distributional RL still remains limited. In this study, we investigate the optimization advantages of distributional RL by utilizing its extra return distribution knowledge over classical RL within the Neural Fitted Z-Iteration~(Neural FZI) framework. To begin with, we demonstrate that the distribution loss of distributional RL has desirable smoothness characteristics and hence enjoys stable gradients, which is in line with its tendency to promote optimization stability. Furthermore, the acceleration effect of distributional RL is revealed by decomposing the return distribution. It shows that distributional RL can perform favorably if the return distribution approximation is appropriate, measured by the variance of gradient estimates in each environment. Rigorous experiments validate the stable optimization behaviors of distributional RL and its acceleration effects compared to classical RL. Our research findings illuminate how the return distribution in distributional RL algorithms helps the optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题