突变驱动的遵循零和游戏中最后一期收敛的正规领导者

论文标题

突变驱动的遵循零和游戏中最后一期收敛的正规领导者

Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games

论文作者

Abe, Kenshi, Sakamoto, Mitsuki, Iwasaki, Atsushi

论文摘要

在这项研究中，我们考虑了两个玩家零和游戏中的正规领导者（FTRL）动态的变体。在时间平时时，FTRL保证会融合到NASH平衡，而许多变体遭受了极限自行车行为问题的困扰，即缺乏最后的介质收敛保证。为此，我们提出了一种突变FTRL（M-FTRL），该算法引入了动作概率扰动的突变。然后，我们研究了M-FTRL的连续时间动力学，并提供了强大的收敛保证，可确保固定点在完全信息反馈下近似NASH平衡。此外，我们的模拟表明，M-FTRL比FTRL和乐观的FTRL在全信息反馈下享有更快的收敛速度，并且在强盗反馈下出人意料地表现出明显的收敛。

In this study, we consider a variant of the Follow the Regularized Leader (FTRL) dynamics in two-player zero-sum games. FTRL is guaranteed to converge to a Nash equilibrium when time-averaging the strategies, while a lot of variants suffer from the issue of limit cycling behavior, i.e., lack the last-iterate convergence guarantee. To this end, we propose mutant FTRL (M-FTRL), an algorithm that introduces mutation for the perturbation of action probabilities. We then investigate the continuous-time dynamics of M-FTRL and provide the strong convergence guarantees toward stationary points that approximate Nash equilibria under full-information feedback. Furthermore, our simulation demonstrates that M-FTRL can enjoy faster convergence rates than FTRL and optimistic FTRL under full-information feedback and surprisingly exhibits clear convergence under bandit feedback.

下载PDF全文

下载文献需遵守相关版权规定

论文标题