学习进度驱动的多代理课程

论文标题

学习进度驱动的多代理课程

Learning Progress Driven Multi-Agent Curriculum

论文作者

Zhao, Wenshuai, Li, Zhiyuan, Pajarinen, Joni

论文摘要

代理的数量可以是控制多机构增强学习（MARL）任务难度的有效课程变量。现有工作通常使用手动定义的课程，例如线性方案。我们在MARL中应用现有的基于奖励的自动课程学习方法时确定了两个潜在缺陷：（1）用于衡量任务难度的预期发作的回报具有很大的差异；（2）在增加代理人数量会产生更高回报的任务中，信用分配困难可能会加剧，这在许多MARL任务中都是常见的。为了解决这些问题，我们建议通过使用基于TD-Error的 *学习进度 *度量来控制课程，并让课程从初始上下文分布到最终任务特定的课程。由于我们的方法在代理的数量上保持分布，并衡量学习进步而不是绝对性能，这通常会随着代理的数量而增加，因此我们减轻了问题（2）。此外，学习进度通过汇总回报来自然减轻问题（1）。在三个具有挑战性的稀疏奖励基准中，我们的方法优于最先进的基线。

The number of agents can be an effective curriculum variable for controlling the difficulty of multi-agent reinforcement learning (MARL) tasks. Existing work typically uses manually defined curricula such as linear schemes. We identify two potential flaws while applying existing reward-based automatic curriculum learning methods in MARL: (1) The expected episode return used to measure task difficulty has high variance; (2) Credit assignment difficulty can be exacerbated in tasks where increasing the number of agents yields higher returns which is common in many MARL tasks. To address these issues, we propose to control the curriculum by using a TD-error based *learning progress* measure and by letting the curriculum proceed from an initial context distribution to the final task specific one. Since our approach maintains a distribution over the number of agents and measures learning progress rather than absolute performance, which often increases with the number of agents, we alleviate problem (2). Moreover, the learning progress measure naturally alleviates problem (1) by aggregating returns. In three challenging sparse-reward MARL benchmarks, our approach outperforms state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题