线性二次深层结构化团队中的强化学习：政策梯度方法的全球融合

论文标题

线性二次深层结构化团队中的强化学习：政策梯度方法的全球融合

Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods

论文作者

Fathi, Vida, Arabneydi, Jalal, Aghdam, Amir G.

论文摘要

在本文中，我们研究了基于模型和无模型的政策梯度下降和自然政策梯度下降算法的全球融合，用于线性二次深层结构化团队。在这样的系统中，将代理分为几个子人群，其中每个子群中的代理通过所有代理的状态和动作的一组线性回归耦合在动力学和成本函数中。每个代理都观察到其地方状态和国家的线性回归，称为深州。对于足够小的风险因素和/或足够大的人口，我们证明了基于模型的策略梯度方法在全球范围内融合到最佳解决方案。考虑到有任意数量的代理商，我们为特殊情况下的风险中性成本功能开发了无模型的策略梯度和自然政策梯度算法。所提出的算法相对于代理数量可扩展，因为其策略空间的维度独立于每个子人群中的代理数量。提供模拟以验证理论结果。

In this paper, we study the global convergence of model-based and model-free policy gradient descent and natural policy gradient descent algorithms for linear quadratic deep structured teams. In such systems, agents are partitioned into a few sub-populations wherein the agents in each sub-population are coupled in the dynamics and cost function through a set of linear regressions of the states and actions of all agents. Every agent observes its local state and the linear regressions of states, called deep states. For a sufficiently small risk factor and/or sufficiently large population, we prove that model-based policy gradient methods globally converge to the optimal solution. Given an arbitrary number of agents, we develop model-free policy gradient and natural policy gradient algorithms for the special case of risk-neutral cost function. The proposed algorithms are scalable with respect to the number of agents due to the fact that the dimension of their policy space is independent of the number of agents in each sub-population. Simulations are provided to verify the theoretical results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题