论文标题

MA-DREAMER:通过共同想象力的协调和沟通

MA-Dreamer: Coordination and communication through shared imagination

论文作者

Lobos-Tsunekawa, Kenzo, Srinivasan, Akshay, Spranger, Michael

论文摘要

由于各个代理人感知到的环境的非平稳性,多代理RL变得困难。理论上使用加强估计器的声音方法会受到其高变化的影响,而基于价值函数的方法受到其对诸如主体间通信等情况的临时处理的影响。由于其对集中批评家等的要求,因此进一步限制了像MADDPG这样的方法,为了解决这些问题,我们提出了Ma-Dreamer,这是一种基于模型的方法,该方法同时使用环境的代理和全球可区分模型,以便使用模型 - 旋转模型训练分散的代理商的政策和批评者,也可以使用模型的想象。由于只有模型训练是在非政策上完成的,因此可以直接处理的方式处理跨政策,因此,主体间的交流/协调和“语言出现”。我们将MA-Dreamer的性能与两个基于足球的游戏的其他方法进行了比较。我们的实验表明,在长期的说话者级别的任务和具有强大部分可视性的合作游戏中,Ma-Dreameer找到了一种有效利用协调的解决方案,而竞争方法分别获得了边际得分和直接失败。通过在更轻松和一般的条件下有效实现协调和沟通,方法为研究更复杂的问题和基于人群的培训打开了大门。

Multi-agent RL is rendered difficult due to the non-stationary nature of environment perceived by individual agents. Theoretically sound methods using the REINFORCE estimator are impeded by its high-variance, whereas value-function based methods are affected by issues stemming from their ad-hoc handling of situations like inter-agent communication. Methods like MADDPG are further constrained due to their requirement of centralized critics etc. In order to address these issues, we present MA-Dreamer, a model-based method that uses both agent-centric and global differentiable models of the environment in order to train decentralized agents' policies and critics using model-rollouts a.k.a `imagination'. Since only the model-training is done off-policy, inter-agent communication/coordination and `language emergence' can be handled in a straight-forward manner. We compare the performance of MA-Dreamer with other methods on two soccer-based games. Our experiments show that in long-term speaker-listener tasks and in cooperative games with strong partial-observability, MA-Dreamer finds a solution that makes effective use of coordination, whereas competing methods obtain marginal scores and fail outright, respectively. By effectively achieving coordination and communication under more relaxed and general conditions, out method opens the door to the study of more complex problems and population-based training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源