论文标题
近端梯度TD算法的有限样本分析
Finite-Sample Analysis of Proximal Gradient TD Algorithms
论文作者
论文摘要
在本文中,我们分析了算法的梯度时间差学习(GTD)家族的收敛速率。对这类算法的先前分析使用ODE技术来证明渐近收敛,据我们所知,尚未进行有限样本分析。此外,在有限样本分析上,用于收敛的非政策增强学习算法的工作并不多。在本文中,我们将GTD方法作为随机梯度算法W.R.T.还提出了两种修订的算法,即预计的GTD2和GTD2-MP,分别提供了改进的收敛保证和加速。我们的理论分析结果表明,算法的GTD家族确实与非政策学习场景中现有的LSTD方法相媲美。
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms. In this paper, we formulate GTD methods as stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.