论文标题
自适应梯度编码
Adaptive Gradient Coding
论文作者
论文摘要
本文着重于减轻散散者对分布式学习系统的影响。与针对固定数量的散落者设计的现有结果不同,我们开发了一种称为自适应梯度编码(AGC)的新方案,具有各种数量的散散剂。我们的计划在计算负载,Straggler公差和通信成本之间给出了最佳的权衡。特别是,它可以根据实用环境中的散落者的实时数量来最大程度地降低通信成本。使用Python和MPI4PY软件包在Amazon EC2群集上实现了在几种情况下的灵活性。
This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designed for a fixed number of stragglers, we developed a new scheme called Adaptive Gradient Coding(AGC) with flexible tolerance of various number of stragglers. Our scheme gives an optimal tradeoff between computation load, straggler tolerance and communication cost. In particular, it allows to minimize the communication cost according to the real-time number of stragglers in the practical environments. Implementations on Amazon EC2 clusters using Python with mpi4py package verify the flexibility in several situations.