论文标题
Envpool:高度平行的加固学习环境执行引擎
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
论文作者
论文摘要
在发展强化学习(RL)培训系统方面取得了重大进展。过去的工作,例如Impala,Apex,Seed RL,样本工厂等,旨在改善系统的整体吞吐量。在本文中,我们旨在解决RL训练系统中的常见瓶颈,即平行环境执行,这通常是整个系统中最慢的部分,但很少受到关注。通过针对并行的RL环境进行精心设计的设计,我们改善了不同硬件设置的RL环境模拟速度,从笔记本电脑和适度的工作站到NVIDIA DGX-A100等高端机器。在高端机器上,Envpool每秒在Atari环境上执行每秒100万帧,在Mujoco环境上每秒执行每秒300万帧。在笔记本电脑上运行Envpool时,速度是Python子过程的2.8倍。此外,在开源社区中已经证明了与现有的RL培训库的极大兼容性,包括CleanRL,RL_Games,DeepMind Acme等。最后,Envpool允许研究人员以更快的速度迭代他们的想法,并且具有巨大的潜力,可以成为事实上的RL环境执行引擎。示例运行表明,训练代理商在笔记本电脑上玩Atari Pong和Mujoco Ant只需五分钟即可。 Envpool在https://github.com/sail-sg/envpool上开源。
There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others, aim to improve the system's overall throughput. In this paper, we aim to address a common bottleneck in the RL training system, i.e., parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups, ranging from a laptop and a modest workstation, to a high-end machine such as NVIDIA DGX-A100. On a high-end machine, EnvPool achieves one million frames per second for the environment execution on Atari environments and three million frames per second on MuJoCo environments. When running EnvPool on a laptop, the speed is 2.8x that of the Python subprocess. Moreover, great compatibility with existing RL training libraries has been demonstrated in the open-sourced community, including CleanRL, rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate their ideas at a much faster pace and has great potential to become the de facto RL environment execution engine. Example runs show that it only takes five minutes to train agents to play Atari Pong and MuJoCo Ant on a laptop. EnvPool is open-sourced at https://github.com/sail-sg/envpool.