通过归一化张量核定常正规化在多代理通信中多样化的消息聚集

论文标题

通过归一化张量核定常正规化在多代理通信中多样化的消息聚集

Diversifying Message Aggregation in Multi-Agent Communication via Normalized Tensor Nuclear Norm Regularization

论文作者

Zhai, Yuanzhao, Xu, Kele, Ding, Bo, Feng, Dawei, Gao, Zijian, Wang, Huaimin

论文摘要

汇总消息是通信多代理增强学习（COMM-MARL）的关键组成部分。最近，它目睹了Comm-Marl中图表网络（GAT）的普遍性，在那里可以将代理表示为节点，并且可以通过加权传递来汇总消息。尽管成功，但GAT可能会导致消息聚集策略的同质性，而``核心''代理可能会过度影响其他代理人的行为，从而严重限制多机构的协调。为了应对这一挑战，我们首先研究了通信图的邻接张量，并证明了消息聚集的均匀性可以通过归一化张量等级来衡量。由于已知等级优化问题是NP-HARD，因此我们定义了一种新的核标准，该规范是归一化张量等级的凸替代物，以取代等级。我们进一步提出了在邻接张量上的插件正常化程序，称为归一化张量核定标准正则化（NTNNR），以积极地丰富训练阶段中消息聚集的多样性。我们在合作社和混合竞争性场景中都用拟议的正规机广泛评估GAT。结果表明，使用NTNNR增强的GAT汇总消息可以提高训练的效率并获得比现有消息聚合方法更高的渐近性能。当将NTNNR应用于现有的图形通用通用方法时，我们还观察到对Starcraft II微观管理基准的性能改善。

Aggregating messages is a key component for the communication of multi-agent reinforcement learning (Comm-MARL). Recently, it has witnessed the prevalence of graph attention networks (GAT) in Comm-MARL, where agents can be represented as nodes and messages can be aggregated via the weighted passing. While successful, GAT can lead to homogeneity in the strategies of message aggregation, and the ``core'' agent may excessively influence other agents' behaviors, which can severely limit the multi-agent coordination. To address this challenge, we first study the adjacency tensor of the communication graph and demonstrate that the homogeneity of message aggregation could be measured by the normalized tensor rank. Since the rank optimization problem is known to be NP-hard, we define a new nuclear norm, which is a convex surrogate of normalized tensor rank, to replace the rank. Leveraging the norm, we further propose a plug-and-play regularizer on the adjacency tensor, named Normalized Tensor Nuclear Norm Regularization (NTNNR), to actively enrich the diversity of message aggregation during the training stage. We extensively evaluate GAT with the proposed regularizer in both cooperative and mixed cooperative-competitive scenarios. The results demonstrate that aggregating messages using NTNNR-enhanced GAT can improve the efficiency of the training and achieve higher asymptotic performance than existing message aggregation methods. When NTNNR is applied to existing graph-attention Comm-MARL methods, we also observe significant performance improvements on the StarCraft II micromanagement benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题