沟通高效的分散本地SGD超过了无向网络

论文标题

沟通高效的分散本地SGD超过了无向网络

Communication-efficient Decentralized Local SGD over Undirected Networks

论文作者

Qin, Tiancheng, Etesami, S. Rasoul, Uribe, César A.

论文摘要

我们考虑了分布式学习问题，其中$ n $代理的网络试图最大程度地减少全球函数$ f $。代理商可以通过嘈杂的梯度访问$ f $，他们可以在本地与邻居交流网络。我们研究了分散的本地可持续发展目标方法，在该方法中，代理执行许多本地梯度步骤，并偶尔与邻居交换信息。以前的算法分析工作集中在特定的网络拓扑（星形拓扑）上，其中领导者节点汇总了所有代理的信息。我们通过分析通信巡回赛数量和每个代理商的计算工作之间的权衡来将设置推广到任意网络。我们根据迭代$ T $的数量，工人$ n $的数量以及基础网络的光谱差距来限制预期的最佳差距。我们的主要结果表明，仅使用$ r =ω（n）$通信回合，就可以达到一个错误，该错误将缩放为$ o（{1}/{nt}）$，其中通信回合的数量独立于$ t $，仅取决于代理的数量。最后，我们通过实验真实和合成数据提供了理论结果的数值证据。

We consider the distributed learning problem where a network of $n$ agents seeks to minimize a global function $F$. Agents have access to $F$ through noisy gradients, and they can locally communicate with their neighbors a network. We study the Decentralized Local SDG method, where agents perform a number of local gradient steps and occasionally exchange information with their neighbors. Previous algorithmic analysis efforts have focused on the specific network topology (star topology) where a leader node aggregates all agents' information. We generalize that setting to an arbitrary network by analyzing the trade-off between the number of communication rounds and the computational effort of each agent. We bound the expected optimality gap in terms of the number of iterates $T$, the number of workers $n$, and the spectral gap of the underlying network. Our main results show that by using only $R=Ω(n)$ communication rounds, one can achieve an error that scales as $O({1}/{nT})$, where the number of communication rounds is independent of $T$ and only depends on the number of agents. Finally, we provide numerical evidence of our theoretical results through experiments on real and synthetic data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题