UNICON：通过统一的选择和对比度学习来打击标签噪声

论文标题

UNICON：通过统一的选择和对比度学习来打击标签噪声

UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning

论文作者

Karim, Nazmul, Rizve, Mamshad Nayeem, Rahnavard, Nazanin, Mian, Ajmal, Shah, Mubarak

论文摘要

监督的深度学习方法需要大量注释数据的存储库；因此，标签噪声是不可避免的。使用这种嘈杂数据的培训对深神经网络的概括性能产生负面影响。为了打击标签噪声，最近的最新方法采用某种样本选择机制来选择可能清洁数据的子集。接下来，将现成的半监督学习方法用于培训，在该培训中，被拒绝的样本被视为未标记的数据。我们的全面分析表明，当前的选择方法不成比例地从简单（快速学习）类中选择样本，同时拒绝相对较难的类别。这会在选定的清洁集中产生类不平衡，然后在高标签噪声下恶化性能。在这项工作中，我们提出了Unicon，这是一种简单而有效的样品选择方法，可对高标签噪声进行鲁棒性。为了解决简单和硬样品的不成比例选择，我们引入了詹森 - 香农差异的统一选择机制，该机制不需要任何概率建模和超参数调整。我们将选择方法与对比度学习相辅相成，以进一步打击嘈杂标签的记忆。对多个基准数据集进行了广泛的实验证明了Unicon的有效性。我们比CIFAR100数据集的当前最新噪声提高了11.4％，噪声率为90％。我们的代码公开可用

Supervised deep learning methods require a large repository of annotated data; hence, label noise is inevitable. Training with such noisy data negatively impacts the generalization performance of deep neural networks. To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data. Next, an off-the-shelf semi-supervised learning method is used for training where rejected samples are treated as unlabeled data. Our comprehensive analysis shows that current selection methods disproportionately select samples from easy (fast learnable) classes while rejecting those from relatively harder ones. This creates class imbalance in the selected clean set and in turn, deteriorates performance under high label noise. In this work, we propose UNICON, a simple yet effective sample selection method which is robust to high label noise. To address the disproportionate selection of easy and hard samples, we introduce a Jensen-Shannon divergence based uniform selection mechanism which does not require any probabilistic modeling and hyperparameter tuning. We complement our selection method with contrastive learning to further combat the memorization of noisy labels. Extensive experimentation on multiple benchmark datasets demonstrates the effectiveness of UNICON; we obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate. Our code is publicly available

下载PDF全文

下载文献需遵守相关版权规定

论文标题