模型浅神经网络的对称和关键点

论文标题

模型浅神经网络的对称和关键点

Symmetry & critical points for a model shallow neural network

论文作者

Arjevani, Yossi, Field, Michael

论文摘要

我们考虑与将两层relu网络与$ k $隐藏的神经元拟合的优化问题，其中假定标签是由（教师）神经网络生成的。我们利用此类模型表现出的丰富对称性来识别关键点的各种家族，并将其表示为$ k^{ - \ frac {1} {2}} $中的功率系列。然后，这些表达式用于得出几个相关数量的估计值，这意味着并非所有虚假最小值都是相似的。特别是，我们表明，虽然某些类型的虚假最小值损失功能衰减为零，例如$ k^{ - 1} $，但在其他情况下，损失会收敛到严格的正常常数。所使用的方法取决于对称性，小组动作的几何形状，分叉和Artin的隐式函数定理。

We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as power series in $k^{-\frac{1}{2}}$. These expressions are then used to derive estimates for several related quantities which imply that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like $k^{-1}$, in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry, the geometry of group actions, bifurcation, and Artin's implicit function theorem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题