与组合个人和团队奖励问题的多代理强化学习

论文标题

与组合个人和团队奖励问题的多代理强化学习

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

论文作者

Sheikh, Hassam Ullah, Bölöni, Ladislau

论文摘要

许多合作的多代理问题要求代理人学习单个任务，同时为小组的集体成功做出贡献。对于当前最新的多代理增强算法而言，这是一项具有挑战性的任务，旨在使团队的全球奖励或各个本地奖励最大化。当任何一个奖励都稀疏导致学习不稳定时，问题会加剧。为了解决这个问题，我们提出了分解的多代理深度确定性政策梯度（DE-MADDPG）：一种新颖的合作多代理增强学习框架，同时学习以最大程度地提高全球和本地奖励。我们评估了有关具有挑战性的防守护送团队问题的解决方案，并表明我们的解决方案比MADDPG算法的直接适应能力更好，更稳定。

Many cooperative multi-agent problems require agents to learn individual tasks while contributing to the collective success of the group. This is a challenging task for current state-of-the-art multi-agent reinforcement algorithms that are designed to either maximize the global reward of the team or the individual local rewards. The problem is exacerbated when either of the rewards is sparse leading to unstable learning. To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to maximize the global and local rewards. We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题