重新审视最佳和非凸的随机分散优化的最佳收敛速率

论文标题

重新审视最佳和非凸的随机分散优化的最佳收敛速率

Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

论文作者

Yuan, Kun, Huang, Xinmeng, Chen, Yiming, Zhang, Xiaohan, Zhang, Yingya, Pan, Pan

论文摘要

分散的优化可有效地节省大规模机器学习中的通信。尽管已经提出了许多具有理论保证和经验成功的算法，但分散优化的性能限制，尤其是网络拓扑及其相关的权重矩阵对最佳收敛速率的影响，尚未完全了解。虽然（Lu和Sa，2021年）最近为非凸的随机分散优化提供了最佳速率，并在线性图上定义了重量矩阵，但具有一般权重矩阵的最佳速率仍不清楚。本文重新审视了非凸的随机分散优化，并建立了使用一般权重矩阵的最佳收敛速率。此外，当非凸损耗的功能进一步满足polyak-lojasiewicz（PL）条件时，我们还建立了最佳速率。遵循文献中现有的分析线无法实现这些结果。取而代之的是，我们利用环晶格图将其接收到一般的重量矩阵，同时保持图直径和权重矩阵连接之间的最佳关系。最后，我们开发了一种新的分散算法，几乎可以在额外的轻度条件下达到上述两个最佳速率。

Decentralized optimization is effective to save communication in large-scale machine learning. Although numerous algorithms have been proposed with theoretical guarantees and empirical successes, the performance limits in decentralized optimization, especially the influence of network topology and its associated weight matrix on the optimal convergence rate, have not been fully understood. While (Lu and Sa, 2021) have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear. This paper revisits non-convex stochastic decentralized optimization and establishes an optimal convergence rate with general weight matrices. In addition, we also establish the optimal rate when non-convex loss functions further satisfy the Polyak-Lojasiewicz (PL) condition. Following existing lines of analysis in literature cannot achieve these results. Instead, we leverage the Ring-Lattice graph to admit general weight matrices while maintaining the optimal relation between the graph diameter and weight matrix connectivity. Lastly, we develop a new decentralized algorithm to nearly attain the above two optimal rates under additional mild conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题