论文标题
深网的全球收敛,一层宽,然后是锥体拓扑
Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology
论文作者
论文摘要
最近的作品表明,梯度下降可以找到过度参数化的神经网络的全球最小值,在该网络中,所有隐藏层的宽度都以$ n $($ n $是培训样本的数量)进行多项式尺寸。在本文中,我们证明,对于深层网络,遵循输入层的宽度$ n $的单层足以确保类似的保证。特别是,所有其余层都可以具有恒定的宽度,并形成金字塔拓扑。我们向广泛使用的LeCun的初始化显示了结果的应用,并为单个宽层的订单$ n^2获得了过度参数化的要求。$。
Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used LeCun's initialization and obtain an over-parameterization requirement for the single wide layer of order $N^2.$