打破许多代理人的诅咒：可证明的平均嵌入q-曲线识别用于平均场增强学习

论文标题

打破许多代理人的诅咒：可证明的平均嵌入q-曲线识别用于平均场增强学习

Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning

论文作者

Wang, Lingxiao, Yang, Zhuoran, Wang, Zhaoran

论文摘要

多机构增强学习（MARL）取得了巨大的经验成功。但是，马尔遭受了许多代理商的诅咒。在本文中，我们利用了MARL中代理的对称性。以最通用的形式，我们研究了一个平均场地MARL问题。这样的平均磁道MAL是在平均场状态下定义的，这是在连续空间上支持的分布。基于分布的平均嵌入，我们提出了求解平均场MARL的MF-FQI算法，并为MF-FQI算法建立了非反应分析。我们强调，MF-FQI算法享有许多代理商的“祝福”，这是因为更多观察到的代理可以改善MF-FQI算法的性能。

Multi-agent reinforcement learning (MARL) achieves significant empirical successes. However, MARL suffers from the curse of many agents. In this paper, we exploit the symmetry of agents in MARL. In the most generic form, we study a mean-field MARL problem. Such a mean-field MARL is defined on mean-field states, which are distributions that are supported on continuous space. Based on the mean embedding of the distributions, we propose MF-FQI algorithm that solves the mean-field MARL and establishes a non-asymptotic analysis for MF-FQI algorithm. We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题