论文标题
Taylorized训练:在有限宽度上更好地逼近神经网络训练
Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width
论文作者
论文摘要
我们提出\ emph {taylorized培训},以便更好地了解有限宽度的神经网络培训。 Taylorized培训涉及培训$ K $ ther的泰勒在初始化时扩展神经网络,这是线性化培训的原则性扩展---最近提出的理解深度学习成功的理论。 我们在现代神经网络体系结构上进行泰勒(Taylorized)的培训,并表明泰勒(Taylorized)培训(1)随着我们增加$ K $而越来越好一致,并且(2)可以显着缩小线性化和完整培训之间的性能差距。与线性化训练相比,高阶培训在更现实的设置中工作,例如标准参数化和大型(初始)学习率。我们通过理论结果补充了我们的实验,表明$ k $ ther订单taylorized模型的近似误差在广泛的神经网络中以$ k $的指数衰减。
We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width. Taylorized training involves training the $k$-th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training---a recently proposed theory for understanding the success of deep learning. We experiment with Taylorized training on modern neural network architectures, and show that Taylorized training (1) agrees with full neural network training increasingly better as we increase $k$, and (2) can significantly close the performance gap between linearized and full training. Compared with linearized training, higher-order training works in more realistic settings such as standard parameterization and large (initial) learning rate. We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.