形状重要：了解噪声协方差的隐式偏见

论文标题

形状重要：了解噪声协方差的隐式偏见

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

论文作者

HaoChen, Jeff Z., Wei, Colin, Lee, Jason D., Ma, Tengyu

论文摘要

随机梯度下降（SGD）中的噪声为训练过度参数化模型提供了至关重要的隐式正则化效果。先前的理论工作主要集中在球形高斯噪声上，而经验研究表明，参数依赖性噪声（由迷你批次或标记扰动引起）的现象比高斯噪声更有效。本文理论上是在Vaskevicius等人引入的四次参数化模型上表征了这种现象。和Woodworth等。我们表明，在过度参数化的环境中，带有标签噪声的SGD用任意初始化恢复了稀疏的地面真相，而具有高斯噪声或梯度下降的sgd升至具有较大规范的密集溶液。我们的分析表明，参数依赖性噪声对局部噪声方差较小的局部最小值引入了偏见，而球形高斯噪声则不会。我们项目的代码公开可用。

The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Prior theoretical work largely focuses on spherical Gaussian noise, whereas empirical studies demonstrate the phenomenon that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. This paper theoretically characterizes this phenomenon on a quadratically-parameterized model introduced by Vaskevicius et el. and Woodworth et el. We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not. Code for our project is publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题