对经过归一化层训练的Relu神经网络的优化理论

论文标题

对经过归一化层训练的Relu神经网络的优化理论

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

论文作者

Dukler, Yonatan, Gu, Quanquan, Montúfar, Guido

论文摘要

深神经网络的成功部分是由于使用归一化层的使用。归一化层在实践中像批处理归一化，层归一化和重量归一化一样无处不在，因为它们可以显着改善概括性能和加快训练的速度。尽管如此，当前的深度学习理论和非凸优化文献的绝大多数都集中在非归一化的环境上，其中所考虑的功能并未表现出普遍归一化的神经网络的特性。在本文中，我们通过为两层神经网络的第一个全局收敛结果弥合了这一差距，该网络具有归一化层训练的RELU激活，即重量归一化。我们的分析表明，与非归一化的神经网络相比，归一化层的引入如何改变优化领域，并可以更快地收敛。

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization performance and speed up training significantly. Nonetheless, the vast majority of current deep learning theory and non-convex optimization literature focuses on the un-normalized setting, where the functions under consideration do not exhibit the properties of commonly normalized neural networks. In this paper, we bridge this gap by giving the first global convergence result for two-layer neural networks with ReLU activations trained with a normalization layer, namely Weight Normalization. Our analysis shows how the introduction of normalization layers changes the optimization landscape and can enable faster convergence as compared with un-normalized neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题