深网的全球收敛，一层宽，然后是锥体拓扑

论文标题

深网的全球收敛，一层宽，然后是锥体拓扑

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

论文作者

Nguyen, Quynh, Mondelli, Marco

论文摘要

最近的作品表明，梯度下降可以找到过度参数化的神经网络的全球最小值，在该网络中，所有隐藏层的宽度都以$ n $（$ n $是培训样本的数量）进行多项式尺寸。在本文中，我们证明，对于深层网络，遵循输入层的宽度$ n $的单层足以确保类似的保证。特别是，所有其余层都可以具有恒定的宽度，并形成金字塔拓扑。我们向广泛使用的LeCun的初始化显示了结果的应用，并为单个宽层的订单$ n^2获得了过度参数化的要求。$。

Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used LeCun's initialization and obtain an over-parameterization requirement for the single wide layer of order $N^2.$

下载PDF全文

下载文献需遵守相关版权规定

论文标题