关于SARSA与线性函数近似的收敛性

论文标题

关于SARSA与线性函数近似的收敛性

On the Convergence of SARSA with Linear Function Approximation

论文作者

Zhang, Shangtong, Tachet, Remi, Laroche, Romain

论文摘要

SARSA是一种用于加固学习的经典上式控制算法，当与线性函数近似结合使用时，已知会chat不休：SARSA不会发散，而是在有界区域中振荡。但是，对于SARSA汇聚到该区域的速度以及该区域的大小知之甚少。在本文中，我们通过显示预计SARSA与有限区域的收敛速度来取得了朝着这个开放问题的进步。重要的是，只要奖励的幅度不大，该地区比我们投射到的区域要小得多。关于线性SARSA与固定点收敛的现有作品，所有的都要求SARSA的政策改进操作员的Lipschitz常数足够小。相反，我们的分析适用于任意的Lipschitz常数，因此表征了新制度的线性SARSA的行为。

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题