从嘈杂的数据中抢救可重复使用的样本，以进行健壮的学习

论文标题

从嘈杂的数据中抢救可重复使用的样本，以进行健壮的学习

Salvage Reusable Samples from Noisy Data for Robust Learning

论文作者

Sun, Zeren, Hua, Xian-Sheng, Yao, Yazhou, Wei, Xiu-Shen, Hu, Guosheng, Zhang, Jian

论文摘要

由于Web图像中存在标签噪声和深度神经网络的高度记忆能力，因此直接通过Web图像训练深细粒度（FG）模型往往具有较低的识别能力。在文献中，为了减轻这个问题，损失校正方法试图估计噪声过渡矩阵，但是不可避免的错误校正会导致严重的累积错误。样本选择方法基于小损失可以减轻累积错误的事实来识别干净（“简单”）样品。但是，“硬”和标记的示例都可以提高FG模型的鲁棒性。为此，我们提出了一种基于确定性的可重复使用的样本选择和校正方法，称为CRSSC，用于在使用Web图像的训练深FG模型中应对标签噪声。我们的关键想法是另外识别和纠正可重复使用的样本，然后将它们与干净的示例一起利用以更新网络。我们从理论和实验观点都证明了所提出的方法的优越性。

Due to the existence of label noise in web images and the high memorization capacity of deep neural networks, training deep fine-grained (FG) models directly through web images tends to have an inferior recognition ability. In the literature, to alleviate this issue, loss correction methods try to estimate the noise transition matrix, but the inevitable false correction would cause severe accumulated errors. Sample selection methods identify clean ("easy") samples based on the fact that small losses can alleviate the accumulated errors. However, "hard" and mislabeled examples that can both boost the robustness of FG models are also dropped. To this end, we propose a certainty-based reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images. Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks. We demonstrate the superiority of the proposed approach from both theoretical and experimental perspectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题