论文标题

深度学习中的方向收敛和对齐

Directional convergence and alignment in deep learning

论文作者

Ji, Ziwei, Telgarsky, Matus

论文摘要

在本文中,我们表明,尽管跨凝结和相关分类损失的最小化在无穷大时消失了,但梯度流通过方向收敛的网络权重,即直接的推论,网络预测,训练误差和保证金分布也会收敛。此证明适用于深层均匀的网络 - 一类允许恢复,最大化,线性和卷积层的网络 - 我们还提供经验支持,不仅提供靠近理论(例如Alexnet),而且还提供了非同化网络(例如,eNSENET)。如果网络进一步具有局部Lipschitz梯度,我们表明这些梯度也会沿方向收敛,并渐近与梯度流动路径保持一致,这对边缘最大化,显着图的收敛性以及其他一些设置的影响。我们的分析补充,与众所周知的神经切线和平均场理论不同,尤其是对网络宽度和初始化的要求,而只是需要完美的分类准确性。该证明是通过开发无限的非边缘库迪卡 - kurdyka-lojasiewicz的不平等的理论来进行的,该理论对于在O最低结构中可定义的功能不平等,并且在深度学习之外也适用。

In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks -- a broad class of networks allowing for ReLU, max-pooling, linear, and convolutional layers -- and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the DenseNet). If the network further has locally Lipschitz gradients, we show that these gradients also converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization, convergence of saliency maps, and a few other settings. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-Łojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源