论文标题
改进的收敛分析,用于分散的在线随机非凸优化
An improved convergence analysis for decentralized online stochastic non-convex optimization
论文作者
论文摘要
在本文中,我们研究了在节点网络上分散的在线随机非凸优化。在分散的随机梯度下降中集成了一种称为梯度跟踪的技术,我们表明,所得算法GT-DSGD具有某些理想的特征,可以最大程度地减少光滑的非凸功能。特别是,对于一般平滑的非凸功能,我们建立了GT-DSGD的非肿瘤特征,并得出了其实现与集中式Minibatch SGD相匹配的网络独立性能的条件。相比之下,现有结果表明GT-DSGD始终依赖网络依赖性,因此严格比集中的Minibatch SGD差。当全局非凸函数还满足polyak-lojasiewics(PL)条件时,我们将GT-DSGD的线性收敛设置为稳态误差,并具有适当的恒定步骤尺寸。此外,在随机近似步骤尺寸下,我们首次建立了几乎每个样本路径上的最佳全局均方根收敛速率,除了预期的渐近最佳的司额率外。由于强烈的凸功能是满足PL条件的功能的特殊情况,因此我们的结果不仅立即适用,而且还提高了当前已知的最佳收敛速率及其对问题参数的依赖。
In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD. In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD. When the global non-convex function additionally satisfies the Polyak-Lojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.