对称两支球队马尔可夫游戏中基于价值的CTDE方法：从合作到团队竞赛

论文标题

对称两支球队马尔可夫游戏中基于价值的CTDE方法：从合作到团队竞赛

Value-based CTDE Methods in Symmetric Two-team Markov Game: from Cooperation to Team Competition

论文作者

Leroy, Pascal, Pisane, Jonathan, Ernst, Damien

论文摘要

在本文中，我们确定了最好的学习场景，以培训一组代理团队，以与反对团队的多种可能的策略竞争。我们在混合竞争环境中评估基于合作价值的方法。我们将自己限制在对称，部分可观察到的两队马尔可夫游戏的情况下。我们根据集中式培训和分散执行（CTDE）范式选择了三种培训方法：QMIX，MAVEN和QVMIX。对于每种方法，我们考虑了三种学习场景，这些方案与培训期间遇到的各种团队政策不同。在我们的实验中，我们修改了星际争霸多代理挑战环境，以创建竞争环境，在其中两个团队都可以同时学习和竞争。我们的结果表明，针对多种不断发展的策略的培训可以取得最佳成果，因为为了得分，球队面临几种策略。

In this paper, we identify the best learning scenario to train a team of agents to compete against multiple possible strategies of opposing teams. We evaluate cooperative value-based methods in a mixed cooperative-competitive environment. We restrict ourselves to the case of a symmetric, partially observable, two-team Markov game. We selected three training methods based on the centralised training and decentralised execution (CTDE) paradigm: QMIX, MAVEN and QVMix. For each method, we considered three learning scenarios differentiated by the variety of team policies encountered during training. For our experiments, we modified the StarCraft Multi-Agent Challenge environment to create competitive environments where both teams could learn and compete simultaneously. Our results suggest that training against multiple evolving strategies achieves the best results when, for scoring their performances, teams are faced with several strategies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题