两层卷积神经网络中的良性过拟合

论文标题

两层卷积神经网络中的良性过拟合

Benign Overfitting in Two-layer Convolutional Neural Networks

论文作者

Cao, Yuan, Chen, Zixiang, Belkin, Mikhail, Gu, Quanquan

论文摘要

现代神经网络通常具有很大的表现力，并且可以接受培训以使培训数据过高，同时仍然可以实现良好的测试性能。这种现象被称为“良性过拟合”。最近，从理论角度出现了一系列研究“良性过度拟合”的作品。但是，它们仅限于线性模型或内核/随机特征模型，并且仍然缺乏关于何时以及如何在神经网络中发生过度拟合的理论理解。在本文中，我们研究了训练两层卷积神经网络（CNN）的良性过度拟合现象。我们表明，当信噪比满足特定条件时，通过梯度下降训练的两层CNN可以实现任意小的训练和测试损失。另一方面，当这种情况无法成立时，过度拟合会变得有害，并且获得的CNN只能实现恒定的测试损失。这些共同证明了由信噪比驱动的良性过度拟合和有害过度拟合之间的急剧过渡。据我们所知，这是第一部精确地表征良性过度拟合在训练卷积神经网络中的条件的工作。

Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there is still a lack of theoretical understanding about when and how benign overfitting occurs in neural networks. In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve a constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题