神经网络的早期学习动力学的简单性令人惊讶

论文标题

神经网络的早期学习动力学的简单性令人惊讶

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

论文作者

Hu, Wei, Xiao, Lechao, Adlam, Ben, Pennington, Jeffrey

论文摘要

现代神经网络通常被视为复杂的黑框函数，由于其非线性依赖数据以及其损失景观中的非传染性，其行为很难理解。在这项工作中，我们表明这些共同的看法在学习的早期阶段可能是完全错误的。特别是，我们正式证明，对于一类良好的输入分布，可以通过在输入上训练简单的线性模型来模仿两层完全连接的神经网络的早期学习动力学。我们还认为，这种令人惊讶的简单性可以在具有更多层和卷积体系结构的网络中持续存在，我们会从经验上验证。我们分析的关键是绑定了初始化时神经切线内核（NTK）和数据核的仿射变换之间差异的光谱规范；但是，与使用NTK的许多以前的结果不同，我们不需要网络的宽度不成比例，并且允许网络以后在训练中逃脱内核制度。

Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) at initialization and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题