论文标题

深网的全球收敛,一层宽,然后是锥体拓扑

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

论文作者

Nguyen, Quynh, Mondelli, Marco

论文摘要

最近的作品表明,梯度下降可以找到过度参数化的神经网络的全球最小值,在该网络中,所有隐藏层的宽度都以$ n $($ n $是培训样本的数量)进行多项式尺寸。在本文中,我们证明,对于深层网络,遵循输入层的宽度$ n $的单层足以确保类似的保证。特别是,所有其余层都可以具有恒定的宽度,并形成金字塔拓扑。我们向广泛使用的LeCun的初始化显示了结果的应用,并为单个宽层的订单$ n^2获得了过度参数化的要求。$。

Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used LeCun's initialization and obtain an over-parameterization requirement for the single wide layer of order $N^2.$

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源