论文标题
通过异质网络的沟通高效的分散机器学习
Communication-efficient Decentralized Machine Learning over Heterogeneous Networks
论文作者
论文摘要
在过去的几年中,分布式机器学习通常是通过异质网络(例如多租户集群中的局部网络)或连接数据中心和边缘群集连接的大型区域网络执行的。在这些异质网络中,工人节点之间的链接速度差异很大,这使得对最先进的机器学习方法的挑战以进行有效的培训。集中式和分散的培训方法都遭受低速链接的影响。在本文中,我们提出了一种分散的方法,即Netmax,它使工人节点可以通过高速链接进行通信,从而大大加快了训练过程。 Netmax具有以下新颖特征。首先,它由一种新颖的共识算法组成,该算法允许Worker节点在其本地数据集上训练模型副本,并通过对等通信交换信息以同步其本地副本,而不是中央主节点(即参数服务器)。其次,每个工人节点都会随机选择一个对等,并用微调的概率进行微调以通过迭代来交换信息。特别是,选择具有高速链接的同龄人的可能性很高。第三,选择对等方的概率旨在最大程度地减少总收敛时间。此外,我们在数学上证明了Netmax的收敛性。我们在异质集群网络上评估了Netmax,并表明与最先进的分散培训方法相比,它可以达到3.7倍,3.4倍和1.9倍的速度,分别是Prague,Alleduce-SGD和AD-PSGD。
In the last few years, distributed machine learning has been usually executed over heterogeneous networks such as a local area network within a multi-tenant cluster or a wide area network connecting data centers and edge clusters. In these heterogeneous networks, the link speeds among worker nodes vary significantly, making it challenging for state-of-the-art machine learning approaches to perform efficient training. Both centralized and decentralized training approaches suffer from low-speed links. In this paper, we propose a decentralized approach, namely NetMax, that enables worker nodes to communicate via high-speed links and, thus, significantly speed up the training process. NetMax possesses the following novel features. First, it consists of a novel consensus algorithm that allows worker nodes to train model copies on their local dataset asynchronously and exchange information via peer-to-peer communication to synchronize their local copies, instead of a central master node (i.e., parameter server). Second, each worker node selects one peer randomly with a fine-tuned probability to exchange information per iteration. In particular, peers with high-speed links are selected with high probability. Third, the probabilities of selecting peers are designed to minimize the total convergence time. Moreover, we mathematically prove the convergence of NetMax. We evaluate NetMax on heterogeneous cluster networks and show that it achieves speedups of 3.7X, 3.4X, and 1.9X in comparison with the state-of-the-art decentralized training approaches Prague, Allreduce-SGD, and AD-PSGD, respectively.