论文标题

复合凸的随机梯度方法的统一分析和光滑优化

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

论文作者

Khaled, Ahmed, Sebbouh, Othmane, Loizou, Nicolas, Gower, Robert M., Richtárik, Peter

论文摘要

我们提出了一个统一定理,用于对随机梯度算法的收敛分析,以最大程度地减少光滑和凸的损失和凸正则器。我们通过扩展了对Gorbunov,Hanzely \&Richtárik(2020)的统一分析,并放弃了强烈凸出损失函数的要求。相反,我们仅依靠损失函数的凸度。我们的统一分析适用于许多现有算法,例如近端SGD,方差减少方法,量化和某些坐标下降类型方法。对于降低方法,我们将最著名的收敛速率恢复为特殊情况。对于近端SGD,量化和坐标类型方法,我们发现了新的最新收敛速率。我们的分析还包括任何形式的采样和小匹配。因此,我们能够确定优化方差减少方法的总复杂性的Minibatch尺寸。我们通过获得一个简单的MiniBatch大小的简单公式(\ textit {l-svrg}和\ textit {saga})来展示这一点。正如我们在几个实验中所示,这种最佳的迷你尺寸不仅提高了方法的理论总复杂性,而且可以改善其实践中的收敛性。

We present a unified theorem for the convergence analysis of stochastic gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely \& Richtárik (2020) and dropping the requirement that the loss function be strongly convex. Instead, we only rely on convexity of the loss function. Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods. For the variance reduced methods, we recover the best known convergence rates as special cases. For proximal SGD, the quantization and coordinate type methods, we uncover new state-of-the-art convergence rates. Our analysis also includes any form of sampling and minibatching. As such, we are able to determine the minibatch size that optimizes the total complexity of variance reduced methods. We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}). This optimal minibatch size not only improves the theoretical total complexity of the methods but also improves their convergence in practice, as we show in several experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源