论文标题

中心性和一致性:具有实例依赖性嘈杂标签的两阶段干净样本识别用于学习的识别

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

论文作者

Zhao, Ganlong, Li, Guanbin, Qin, Yipeng, Liu, Feng, Yu, Yizhou

论文摘要

经过嘈杂标签训练的深层模型很容易在概括中过度拟合和挣扎。大多数现有的解决方案都是基于理想的假设,即标签噪声是类条件,即同一类的实例共享相同的噪声模型,并且独立于特征。在实践中,现实世界中的噪声模式通常更为细粒度,因为实例依赖性噪声模式构成了巨大的挑战,尤其是在阶层间失衡的情况下。在本文中,我们提出了一种两阶段的清洁样品识别方法,以应对上述挑战。首先,我们采用类级特征聚类程序,以早期识别在班级预测中心附近的干净样品。值得注意的是,我们通过根据稀有类的预测熵来解决类不平衡问题。其次,对于接近地面真相类边界的剩余清洁样品(通常与样品与实例有关的噪声混合),我们提出了一种基于一致性的新分类方法,该方法使用两个分类器头的一致性来识别它们:一致性越高,样品的概率越大。对几个具有挑战性的基准测试的广泛实验表明,我们的方法与最先进的方法相比。

Deep models trained with noisy labels are prone to over-fitting and struggle in generalization. Most existing solutions are based on an ideal assumption that the label noise is class-conditional, i.e., instances of the same class share the same noise model, and are independent of features. While in practice, the real-world noise patterns are usually more fine-grained as instance-dependent ones, which poses a big challenge, especially in the presence of inter-class imbalance. In this paper, we propose a two-stage clean samples identification method to address the aforementioned challenge. First, we employ a class-level feature clustering procedure for the early identification of clean samples that are near the class-wise prediction centers. Notably, we address the class imbalance problem by aggregating rare classes according to their prediction entropy. Second, for the remaining clean samples that are close to the ground truth class boundary (usually mixed with the samples with instance-dependent noises), we propose a novel consistency-based classification method that identifies them using the consistency of two classifier heads: the higher the consistency, the larger the probability that a sample is clean. Extensive experiments on several challenging benchmarks demonstrate the superior performance of our method against the state-of-the-art.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源