可伸缩多代理增强学习的高性能模拟

论文标题

可伸缩多代理增强学习的高性能模拟

High Performance Simulation for Scalable Multi-Agent Reinforcement Learning

论文作者

Langham-Lopez, Jordan, Schmon, Sebastian M., Cannon, Patrick

论文摘要

多代理增强学习实验和开源培训环境通常受到限制，支撑数十个或有时甚至多达数百种相互作用的代理。在本文中，我们演示了Vogue的使用，Vogue是一个基于高性能代理的模型（ABM）框架。 Vogue是一个多代理培训环境，为成千上万的互动代理提供支持，同时通过在GPU上运行环境和增强学习（RL）代理来维持高训练吞吐量。在此规模上的高性能多代理环境有可能学习可靠和灵活的策略，以在复杂系统的ABM和模拟中使用。我们通过两个新开发的大型多代理培训环境展示了培训表现。此外，我们表明这些环境可以在数分钟和数小时的时间范围内训练共享的RL策略。

Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents while maintaining high training throughput by running both the environment and reinforcement learning (RL) agents on the GPU. High performance multi-agent environments at this scale have the potential to enable the learning of robust and flexible policies for use in ABMs and simulations of complex systems. We demonstrate training performance with two newly developed, large scale multi-agent training environments. Moreover, we show that these environments can train shared RL policies on time-scales of minutes and hours.

下载PDF全文

下载文献需遵守相关版权规定

论文标题