论文标题
深度Q学习:渐近分析的理论见解
Deep Q-Learning: Theoretical Insights from an Asymptotic Analysis
论文作者
论文摘要
深度Q学习是一种重要的增强学习算法,涉及训练一个名为Deep Q-Network(DQN)的深神经网络,以近似众所周知的Q功能。尽管在实验室条件下取得了巨大的成功,但理论与实践之间的严重差距以及缺乏正式保证可以阻止其在现实世界中的使用。采用动力学系统的角度,我们提供了在现实且可验证的假设下对流行版本的深Q学习版本进行理论分析。更具体地说,我们证明了算法的收敛性的重要结果,表征了学习过程的渐近行为。我们的结果阐明了算法的迄今无法解释的特性,并有助于理解经验观察,例如即使在训练后,绩效不一致。与以前的理论不同,我们的分析适应具有多个固定分布的州马尔可夫流程。尽管专注于深度Q学习,但我们认为我们的理论可以应用于了解其他深度学习算法
Deep Q-Learning is an important reinforcement learning algorithm, which involves training a deep neural network, called Deep Q-Network (DQN), to approximate the well-known Q-function. Although wildly successful under laboratory conditions, serious gaps between theory and practice as well as a lack of formal guarantees prevent its use in the real world. Adopting a dynamical systems perspective, we provide a theoretical analysis of a popular version of Deep Q-Learning under realistic and verifiable assumptions. More specifically, we prove an important result on the convergence of the algorithm, characterizing the asymptotic behavior of the learning process. Our result sheds light on hitherto unexplained properties of the algorithm and helps understand empirical observations, such as performance inconsistencies even after training. Unlike previous theories, our analysis accommodates state Markov processes with multiple stationary distributions. In spite of the focus on Deep Q-Learning, we believe that our theory may be applied to understand other deep learning algorithms