论文标题
SGD在两个层神经网上的全球收敛
Global Convergence of SGD On Two Layer Neural Nets
论文作者
论文摘要
在本说明中,我们考虑适当正规化的$ \ ell_2- $深度2 $净净的经验风险,并显示了任何数量的大门,并显示了有关其在其上的经验损失如何演变的界限 - 用于任意数据以及激活是否足够平滑且像Sigmoid and Sigmoid and Tanh一样平滑。反过来,这导致了SGD全球收敛的证明,用于特殊的初始化类别。我们还证明,连续时间SGD的指数快速收敛速率也适用于诸如SoftPlus之类的平滑无界激活。我们的关键思想是展示Frobenius规范规范损失功能在恒定的神经网上的存在,这些神经网是“ Villani函数”,因此能够通过在此类目标上分析SGD来建立最新的进步。最关键的是,我们分析所需的正规化量与网的大小无关。
In this note, we consider appropriately regularized $\ell_2-$empirical risk of depth $2$ nets with any number of gates and show bounds on how the empirical loss evolves for SGD iterates on it -- for arbitrary data and if the activation is adequately smooth and bounded like sigmoid and tanh. This in turn leads to a proof of global convergence of SGD for a special class of initializations. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives. Most critically the amount of regularization required for our analysis is independent of the size of the net.