深度学习中的隐藏进步：SGD学习计算限制附近的平等

论文标题

深度学习中的隐藏进步：SGD学习计算限制附近的平等

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

论文作者

Barak, Boaz, Edelman, Benjamin L., Goel, Surbhi, Kakade, Sham, Malach, Eran, Zhang, Cyril

论文摘要

当我们扩大数据集，模型尺寸和培训时间时，深入学习方法的能力中存在越来越多的证据。尽管有一些关于这些资源如何调节统计能力的说法，但对它们对模型培训的计算问题的影响知之甚少。这项工作通过学习$ n $ bits的$ k $ -sparse奇偶校验的角度进行了探索，这是一个规范的离散搜索问题，在统计上很容易，但在计算上很难。从经验上讲，我们发现各种神经网络成功地学习了稀疏的平等，并在训练曲线中进行了不连续的相变。在小实例上，学习突然发生在大约$ n^{o（k）} $迭代中；尽管显然缺乏稀疏的先验，但这几乎与SQ的下限相匹配。我们的理论分析表明，这些观察结果并未通过类似Langevin的机制来解释，在该机制中，SGD“在黑暗中跌跌撞撞”，直到找到隐藏的特征集（一种自然算法，也以$ n^{o（k）} $时间运行）。取而代之的是，我们表明SGD通过人群梯度中的傅立叶差异逐渐放大了稀疏的解决方案，这使得持续的进展是损失和误差指标不可见的。

There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times. While there are some accounts of how these resources modulate statistical capacity, far less is known about their effect on the computational problem of model training. This work conducts such an exploration through the lens of learning a $k$-sparse parity of $n$ bits, a canonical discrete search problem which is statistically easy but computationally hard. Empirically, we find that a variety of neural networks successfully learn sparse parities, with discontinuous phase transitions in the training curves. On small instances, learning abruptly occurs at approximately $n^{O(k)}$ iterations; this nearly matches SQ lower bounds, despite the apparent lack of a sparse prior. Our theoretical analysis shows that these observations are not explained by a Langevin-like mechanism, whereby SGD "stumbles in the dark" until it finds the hidden set of features (a natural algorithm which also runs in $n^{O(k)}$ time). Instead, we show that SGD gradually amplifies the sparse solution via a Fourier gap in the population gradient, making continual progress that is invisible to loss and error metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题