论文标题
非convex随机梯度下降的高概率收敛范围,并降低了韦布尔噪声
High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise
论文作者
论文摘要
随机梯度下降是机器学习中使用的最常见的迭代算法之一,其收敛分析是丰富的研究领域。了解其收敛属性可以帮助告知其在不同设置中使用的修改。但是,大多数理论结果要么假设凸度,要么仅提供收敛结果。另一方面,本文证明了在不假定凸度的情况下高概率的收敛范围。假设平滑度很强,我们证明了在两种设置中的高概率收敛范围:(1)假设Polyak-olojasiewicz不等式和标准次数下梯度噪声,以及(2)假设Norm-Weibull梯度噪声。在第二个环境中,作为证明融合的中间步骤,我们证明了独立利益的亚韦伯尔·马丁格差序列序列自相构级的浓度不平等。它将自由人型的浓度扩展到次指数阈值之外,到较重的尾差差序列。我们还提供了一种后处理方法,该方法可以选择具有可证明的融合保证的单个迭代,而不是对未知最佳迭代的常规限制。我们的收敛性结果与随机梯度下降相比,与随机梯度下降相比,随机下降具有相等或更好的收敛保证,并进行了修改,例如剪接,动量和归一化。
Stochastic gradient descent is one of the most common iterative algorithms used in machine learning and its convergence analysis is a rich area of research. Understanding its convergence properties can help inform what modifications of it to use in different settings. However, most theoretical results either assume convexity or only provide convergence results in mean. This paper, on the other hand, proves convergence bounds in high probability without assuming convexity. Assuming strong smoothness, we prove high probability convergence bounds in two settings: (1) assuming the Polyak-Łojasiewicz inequality and norm sub-Gaussian gradient noise and (2) assuming norm sub-Weibull gradient noise. In the second setting, as an intermediate step to proving convergence, we prove a sub-Weibull martingale difference sequence self-normalized concentration inequality of independent interest. It extends Freedman-type concentration beyond the sub-exponential threshold to heavier-tailed martingale difference sequences. We also provide a post-processing method that picks a single iterate with a provable convergence guarantee as opposed to the usual bound for the unknown best iterate. Our convergence result for sub-Weibull noise extends the regime where stochastic gradient descent has equal or better convergence guarantees than stochastic gradient descent with modifications such as clipping, momentum, and normalization.