论文标题

双重下降的简短史无前例

A Brief Prehistory of Double Descent

论文作者

Loog, Marco, Viering, Tom, Mey, Alexander, Krijthe, Jesse H., Tax, David M. J.

论文摘要

在发人深省的论文[1]中,Belkin等。在现代高复杂学习者的背景下说明并讨论风险曲线的形状。考虑到固定的培训样本量$ n $,此类曲线表明了学习者的风险是其复杂性$ n $的某些(近似)度量的函数。使用$ n $功能的数量,这些曲线也称为特征曲线。 [1]中的一个显着观察是,这些曲线可以显示,他们所谓的,双重下降:随着$ n $的增加,风险最初会降低,最低降低,然后增加,直到$ n $等于$ n $,其中培训数据完美地拟合。增加$ n $的增加,风险降低了第二次也是最后一次,在$ n = n $中造成了峰值。这个双重下降可能会令人惊讶,但与[1]报告相比,历史上没有被忽视。我们的信提请人们注意当代机器学习感兴趣的一些原始的,早期的发现。

In their thought-provoking paper [1], Belkin et al. illustrate and discuss the shape of risk curves in the context of modern high-complexity learners. Given a fixed training sample size $n$, such curves show the risk of a learner as a function of some (approximate) measure of its complexity $N$. With $N$ the number of features, these curves are also referred to as feature curves. A salient observation in [1] is that these curves can display, what they call, double descent: with increasing $N$, the risk initially decreases, attains a minimum, and then increases until $N$ equals $n$, where the training data is fitted perfectly. Increasing $N$ even further, the risk decreases a second and final time, creating a peak at $N=n$. This twofold descent may come as a surprise, but as opposed to what [1] reports, it has not been overlooked historically. Our letter draws attention to some original, earlier findings, of interest to contemporary machine learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源