记忆效率的管道平行DNN训练

论文标题

记忆效率的管道平行DNN训练

Memory-Efficient Pipeline-Parallel DNN Training

论文作者

Narayanan, Deepak, Phanishayee, Amar, Shi, Kaiyu, Chen, Xie, Zaharia, Matei

论文摘要

通过扩展现有模型中的参数数量，获得了许多最新的ML结果。但是，如此大型模型的参数和激活通常不适合单个加速器设备的内存。这意味着有必要在多个加速器上分发大型模型的培训。在这项工作中，我们提出了PipedReam-2BW，这是一个支持记忆有效管道并行性的系统。 PIPEDREAM-2BW使用一种新颖的管道和重量梯度合并策略，再加上重量的双重缓冲，以确保与数据并行性相似的高吞吐量，低内存足迹和重量更新语义。此外，PIPEDREAM-2BW自动将模型对可用的硬件资源进行分区，同时尊重硬件约束，例如加速器的内存能力和互连拓扑。 Pipedream-2BW可以以相似的最终模型准确性来加速大型GPT和BERT语言模型的培训。

Many state-of-the-art ML results have been obtained by scaling up the number of parameters in existing models. However, parameters and activations for such large models often do not fit in the memory of a single accelerator device; this means that it is necessary to distribute training of large models over multiple accelerators. In this work, we propose PipeDream-2BW, a system that supports memory-efficient pipeline parallelism. PipeDream-2BW uses a novel pipelining and weight gradient coalescing strategy, combined with the double buffering of weights, to ensure high throughput, low memory footprint, and weight update semantics similar to data parallelism. In addition, PipeDream-2BW automatically partitions the model over the available hardware resources, while respecting hardware constraints such as memory capacities of accelerators and interconnect topologies. PipeDream-2BW can accelerate the training of large GPT and BERT language models by up to 20$\times$ with similar final model accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题