关于与稳定学习者的K折交叉验证的偏见

论文标题

关于与稳定学习者的K折交叉验证的偏见

On the bias of K-fold cross validation with stable learners

论文作者

Aghbalou, Anass, Portier, François, Sabourin, Anne

论文摘要

本文研究了K折交叉验证（CV）程序的效率和其辩论版本，以估计学习算法的概括风险。我们在统一算法稳定性的一般假设下工作。我们表明，在这种一般稳定性假设下，K折风险估计可能不一致，它通过在现实情况下（例如正则经验风险最小化和随机梯度下降）构建非消失的下限。因此，我们主张使用k折的依据版本，并证明与此版本有关指数尾巴衰减的错误。我们的结果适用于大型统一稳定算法，相反，较早的工作着重于特定任务，例如密度估计。我们说明了de依的k折CV在简单模型选择问题上的相关性，并在经验上证明了促进方法对现实世界分类和回归数据集的有用性。

This paper investigates the efficiency of the K-fold cross-validation (CV) procedure and a debiased version thereof as a means of estimating the generalization risk of a learning algorithm. We work under the general assumption of uniform algorithmic stability. We show that the K-fold risk estimate may not be consistent under such general stability assumptions, by constructing non vanishing lower bounds on the error in realistic contexts such as regularized empirical risk minimisation and stochastic gradient descent. We thus advocate the use of a debiased version of the K-fold and prove an error bound with exponential tail decay regarding this version. Our result is applicable to the large class of uniformly stable algorithms, contrarily to earlier works focusing on specific tasks such as density estimation. We illustrate the relevance of the debiased K-fold CV on a simple model selection problem and demonstrate empirically the usefulness of the promoted approach on real world classification and regression datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题