论文标题

具有标准化特征的神经塌陷:Riemannian歧管上的几何分析

Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

论文作者

Yaras, Can, Wang, Peng, Zhu, Zhihui, Balzano, Laura, Qu, Qing

论文摘要

当训练过度参数化的深网以进行分类任务时,已经广泛观察到,学到的功能表现出所谓的“神经崩溃”现象。更具体地说,对于倒数第二层的输出特征,对于每个类,课堂内特征会收敛到其手段,而不同类的手段则表现出一定的紧密框架结构,这也与最后一层的分类器一致。由于最后一层的特征归一化成为现代表示学习中的一种常见实践,因此,在这项工作中,我们从理论上为归一化特征证明了神经崩溃现象是合理的。基于不受限制的特征模型,我们通过将所有特征和分类器限制在球体上的所有特征和分类器来简化多级分类任务中的经验损失函数。在这种情况下,我们分析了在球体的产物上,分析了Riemannian优化问题的非概念景观,从而显示出良性的全球景观,从某种意义上说,唯一的全球最小化器是神经崩溃的解决方案,而所有其他关键点是严格的鞍座,具有负曲率。实用深网的实验结果证实了我们的理论,并证明可以通过特征归一化可以更快地学习更好的表示。

When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon for normalized features. Based on an unconstrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddles with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源