部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Independent and Decentralized Learning in Markov Potential Games

论文作者

Maheshwari, Chinmay, Wu, Manxi, Pai, Druv, Sastry, Shankar

论文摘要

我们研究了多方强化学习动态，并分析其在无限 - 摩尼克折扣马尔可夫潜在游戏中的渐近行为。我们专注于独立和分散的设置，玩家不知道游戏参数，并且无法进行交流或协调。在每个阶段，玩家都会根据实现的单阶段奖励以异步方式更新Q功能的估计，以评估其总收益。然后，玩家通过根据估计的Q功能纳入最佳的一阶段偏差策略来独立更新其政策。受单一辅助增强学习中的参与者批评算法的启发，我们学习动力学的关键特征是，代理商以比政策更快地更快的时间范围更新其Q功能估计值。利用两次刻度异步随机近似理论的工具，我们表征了收敛的学习动力集。

We study a multi-agent reinforcement learning dynamics, and analyze its asymptotic behavior in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized setting, where players do not know the game parameters, and cannot communicate or coordinate. In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward in an asynchronous manner. Then, players independently update their policies by incorporating an optimal one-stage deviation strategy based on the estimated Q-function. Inspired by the actor-critic algorithm in single-agent reinforcement learning, a key feature of our learning dynamics is that agents update their Q-function estimates at a faster timescale than the policies. Leveraging tools from two-timescale asynchronous stochastic approximation theory, we characterize the convergent set of learning dynamics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题