论文标题
HPSGD:层次平行SGD,具有陈旧梯度的特征
HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring
论文作者
论文摘要
尽管分布式培训显着加快了深神经网络(DNN)的训练过程,但由于工人之间耗时的数据同步,该集群的利用率相对较低。为了减轻这个问题,提出了一种新型的层次平行SGD(HPSGD)策略,该策略是基于观察结果,即数据同步阶段可以与局部训练阶段(即馈电和背部)平行。此外,改进的模型更新方法是对纠正引入的陈旧梯度问题的统一性的,该问题将更新到复制品(即具有与全局模型相同的临时模型),然后将平均值变化与全局模型合并。进行了广泛的实验,以证明所提出的HPSGD方法显着增强了分布式DNN训练,减少了陈旧梯度的干扰,并在给定的固定壁时间中实现了更好的准确性。
While distributed training significantly speeds up the training process of the deep neural network (DNN), the utilization of the cluster is relatively low due to the time-consuming data synchronizing between workers. To alleviate this problem, a novel Hierarchical Parallel SGD (HPSGD) strategy is proposed based on the observation that the data synchronization phase can be paralleled with the local training phase (i.e., Feed-forward and back-propagation). Furthermore, an improved model updating method is unitized to remedy the introduced stale gradients problem, which commits updates to the replica (i.e., a temporary model that has the same parameters as the global model) and then merges the average changes to the global model. Extensive experiments are conducted to demonstrate that the proposed HPSGD approach substantially boosts the distributed DNN training, reduces the disturbance of the stale gradients and achieves better accuracy in given fixed wall-time.