MSRL：带有数据流片段的分布式增强学习

论文标题

MSRL：带有数据流片段的分布式增强学习

MSRL: Distributed Reinforcement Learning with Dataflow Fragments

论文作者

Zhu, Huanzhou, Zhao, Bo, Chen, Gang, Chen, Weifeng, Chen, Yijie, Shi, Liang, Yang, Yaodong, Pietzuch, Peter, Chen, Lei

论文摘要

强化学习（RL）训练许多代理，这是资源密集型的，必须扩展到大型GPU集群。不同的RL培训算法为分发和平行计算提供了不同的机会。然而，当前的分布式RL系统将RL算法的定义与其分布式执行联系起来：它们硬代码特定的分发策略，并且仅加速了GPU工人对计算的特定部分（例如策略网络更新）。从根本上讲，当前的系统缺乏将RL算法与执行相结合的抽象。我们描述了Mindspore增强学习（MSRL），这是一种分布式的RL培训系统，支持控制RL培训计算如何并行并在集群资源上分发的分配策略，而无需更改算法实现。 MSRL引入了零散的数据流图的新抽象，该图将Python从RL算法的训练环绘制到并行计算片段。片段通过将其转换为低级数据流表示，例如深度学习引擎，CUDA实现或多线程CPU流程支持的计算图。我们表明，MSRL将现有系统的分销策略归入，同时将RL培训扩展到64 GPU。

Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e.g. policy network updates) on GPU workers. Fundamentally, current systems lack abstractions that decouple RL algorithms from their execution. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps Python functions from an RL algorithm's training loop to parallel computational fragments. Fragments are executed on different devices by translating them to low-level dataflow representations, e.g. computational graphs as supported by deep learning engines, CUDA implementations or multi-threaded CPU processes. We show that MSRL subsumes the distribution strategies of existing systems, while scaling RL training to 64 GPUs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题