论文标题
深度学习对对称标签噪声非常强大
Deep Learning is Provably Robust to Symmetric Label Noise
论文作者
论文摘要
深神经网络(DNN)能够完美地拟合训练数据,包括记住嘈杂的数据。人们普遍认为,记忆会损害概括。因此,许多最近的著作提出了缓解策略,以避免嘈杂的数据或正确的记忆。在这项工作中,我们退后一步,问一个问题:深度学习能否在没有任何缓解的情况下对巨大的标签噪声保持强大?我们为对称标签噪声的情况提供了一个肯定的答案:我们发现某些DNN,包括参数较少和参数过度的模型,可以忍受大量的对称标签噪声,直到信息理论阈值。通过诉诸于DNN的经典统计理论和普遍的一致性,我们证明,对于多类别的分类,在对称标签噪声下接受培训的$ l_1 $ - 一致性的DNN分类器可以渐近地实现贝叶斯最佳性,如果标签噪声概率小于$ \ frac {k-1} $ k-k} $,$ k k $ k $ ke,n $ k-is $ ke 2我们的结果表明,对于对称标签噪声,对于$ l_1 $ constistent的估计器无需缓解。我们猜想,对于一般标签噪声,使用嘈杂数据的缓解策略将优于忽略嘈杂数据的噪声。
Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mitigation? We provide an affirmative answer for the case of symmetric label noise: We find that certain DNNs, including under-parameterized and over-parameterized models, can tolerate massive symmetric label noise up to the information-theoretic threshold. By appealing to classical statistical theory and universal consistency of DNNs, we prove that for multiclass classification, $L_1$-consistent DNN classifiers trained under symmetric label noise can achieve Bayes optimality asymptotically if the label noise probability is less than $\frac{K-1}{K}$, where $K \ge 2$ is the number of classes. Our results show that for symmetric label noise, no mitigation is necessary for $L_1$-consistent estimators. We conjecture that for general label noise, mitigation strategies that make use of the noisy data will outperform those that ignore the noisy data.