论文标题

在山脊回归中过度拟合

Benign overfitting in ridge regression

论文作者

Tsigler, A., Bartlett, P. L.

论文摘要

在深度学习的许多现代应用中,神经网络的参数比用于培训的数据点要多。受这些实践的促进,最近的一系列理论研究致力于研究过度参数化的模型。该制度中的中心现象之一是模型插值嘈杂数据的能力,但仍具有低于该数据中噪声量的测试误差。 ARXIV:1906.11300的特征是,如果一个人认为使用最低$ \ ell_2 $ -norm的插值解决方案,则可以在线性回归中发生这种现象的协方差结构,并且数据具有独立的组件:它们在差异方面具有较小的范围,并且仅在数据concovariancer subsimance subsimance subim subim subim subsimist上均可构成较小的范围,并且表明该分数较小。我们通过消除独立性假设并为偏见项提供尖锐的界限来增强和完成他们的结果。因此,我们的结果适用于比Arxiv:1906.11300的更通用环境,例如内核回归,不仅表征了噪声如何抑制噪声,还表征了真正信号的哪一部分。此外,我们将结果扩展到脊回归的设置,这使我们能够解释另一个有趣的现象:我们提供了最佳正则化为负面的一般条件。

In many modern applications of deep learning the neural network has many more parameters than the data points used for its training. Motivated by those practices, a large body of recent theoretical research has been devoted to studying overparameterized models. One of the central phenomena in this regime is the ability of the model to interpolate noisy data, but still have test error lower than the amount of noise in that data. arXiv:1906.11300 characterized for which covariance structure of the data such a phenomenon can happen in linear regression if one considers the interpolating solution with minimum $\ell_2$-norm and the data has independent components: they gave a sharp bound on the variance term and showed that it can be small if and only if the data covariance has high effective rank in a subspace of small co-dimension. We strengthen and complete their results by eliminating the independence assumption and providing sharp bounds for the bias term. Thus, our results apply in a much more general setting than those of arXiv:1906.11300, e.g., kernel regression, and not only characterize how the noise is damped but also which part of the true signal is learned. Moreover, we extend the result to the setting of ridge regression, which allows us to explain another interesting phenomenon: we give general sufficient conditions under which the optimal regularization is negative.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源