随机优化的乘法噪声和重尾巴

论文标题

随机优化的乘法噪声和重尾巴

Multiplicative noise and heavy tails in stochastic optimization

论文作者

Hodgkinson, Liam, Mahoney, Michael W.

论文摘要

尽管随机优化是现代机器学习的核心，但其成功的确切机制，尤其是随机性的确切作用，仍然不清楚。将随机优化算法建模为离散的随机复发关系，我们表明，由于局部收敛速率的差异，乘法噪声通常会导致参数中的重尾平稳行为。对SGD进行了详细的分析，用于应用于简单的线性回归问题，然后针对更大类别的模型（包括非线性和非convex）和优化器（包括动量，亚当和随机牛顿）进行理论结果，这表明我们的定性结果更普遍。在每种情况下，我们描述了对关键因素的依赖，包括步骤大小，批处理大小和数据可变性，所有这些因素与来自计算机视觉和自然语言处理的最新神经网络模型的最新经验结果相似。此外，我们在经验上证明了仅具有添加剂噪声和轻尾结构的常见的随机动力学，乘积噪声和重尾结构如何改善盆地跳跃和探索非凸损失表面的能力。

Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization algorithms as discrete random recurrence relations, we show that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in heavy-tailed stationary behaviour in the parameters. A detailed analysis is conducted for SGD applied to a simple linear regression problem, followed by theoretical results for a much larger class of models (including non-linear and non-convex) and optimizers (including momentum, Adam, and stochastic Newton), demonstrating that our qualitative results hold much more generally. In each case, we describe dependence on key factors, including step size, batch size, and data variability, all of which exhibit similar qualitative behavior to recent empirical results on state-of-the-art neural network models from computer vision and natural language processing. Furthermore, we empirically demonstrate how multiplicative noise and heavy-tailed structure improve capacity for basin hopping and exploration of non-convex loss surfaces, over commonly-considered stochastic dynamics with only additive noise and light-tailed structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题