论文标题

线性两次尺度随机近似的有限时间分析与马尔可夫噪声

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

论文作者

Kaledin, Maxim, Moulines, Eric, Naumov, Alexey, Tadic, Vladislav, Wai, Hoi-To

论文摘要

线性两次尺度随机近似(SA)方案是一类重要的算法,在强化学习(RL)中已经流行,尤其是对于政策评估问题。最近,许多作品致力于建立该方案的有限时间分析,尤其是在Markovian(non-i.i.d。)噪声设置下,这些噪声设置在实践中无处不在。在本文中,我们为线性两次尺度SA提供了有限的时间分析。我们的边界表明,马尔可夫噪声和马尔丁勒噪声之间的收敛速度没有差异,只有常数受马尔可夫链的混合时间的影响。有了适当的步长时间表,预期错误中的瞬态项为$ O(1/k^c)$,稳态项为$ {\ cal o}(1/k)$,其中$ c> 1 $和$ k $是迭代号码。此外,我们提出了预期误差的渐近扩展,匹配的下限为$ω(1/k)$。提出了一个简单的数值实验来支持我们的理论。

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is $o(1/k^c)$ and the steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of $Ω(1/k)$. A simple numerical experiment is presented to support our theory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源