您的噪音更正嘈杂吗？ PLS：具有两个阶段检测的标签噪声的稳健性

论文标题

您的噪音更正嘈杂吗？ PLS：具有两个阶段检测的标签噪声的稳健性

Is your noise correction noisy? PLS: Robustness to label noise with two stage detection

论文作者

Albert, Paul, Arazo, Eric, Krishna, Tarun, O'Connor, Noel E., McGuinness, Kevin

论文摘要

设计能够在网络未经保育数据集上训练精确神经网络的强大算法一直是研究的主题，因为它减少了消耗耗时的人工劳动的需求。许多以前的研究贡献的重点是检测不同类型的标签噪声。但是，本文提议一旦检测到噪声样本的校正精度，以提高它们的校正精度。在许多最先进的贡献中，采用了两阶段的方法，在以半监督的方式猜测校正后的伪标签之前，检测到嘈杂的样本。然后，猜测的伪标签将用于监督目标中，而不会确保标签猜测可能是正确的。这可能导致确认偏置，从而降低噪声稳定性。在这里，我们提出了伪损失，这是一个简单的度量标准，我们发现与噪声样本上的伪标签正确性密切相关。使用伪损失，我们在整个训练过程中动态减轻了体重不足的伪标签，以避免确认偏差并提高网络的准确性。我们还建议使用一个信心指导的对比目标，该目标在插值（有监督）之间的插值（自信校正样本）和无监督的表示方面的插入式目标上学习强大的表示。实验证明了我们在各种基准数据集上我们的伪损失选择（PLS）算法的最新性能，包括策划的数据综合损坏，这些数据构成了分布和分发噪声，以及两个现实世界的Web噪声数据集。我们的实验是完全可重现的github.com/paulalbert31/sncf

Designing robust algorithms capable of training accurate neural networks on uncurated datasets from the web has been the subject of much research as it reduces the need for time consuming human labor. The focus of many previous research contributions has been on the detection of different types of label noise; however, this paper proposes to improve the correction accuracy of noisy samples once they have been detected. In many state-of-the-art contributions, a two phase approach is adopted where the noisy samples are detected before guessing a corrected pseudo-label in a semi-supervised fashion. The guessed pseudo-labels are then used in the supervised objective without ensuring that the label guess is likely to be correct. This can lead to confirmation bias, which reduces the noise robustness. Here we propose the pseudo-loss, a simple metric that we find to be strongly correlated with pseudo-label correctness on noisy samples. Using the pseudo-loss, we dynamically down weight under-confident pseudo-labels throughout training to avoid confirmation bias and improve the network accuracy. We additionally propose to use a confidence guided contrastive objective that learns robust representation on an interpolated objective between class bound (supervised) for confidently corrected samples and unsupervised representation for under-confident label corrections. Experiments demonstrate the state-of-the-art performance of our Pseudo-Loss Selection (PLS) algorithm on a variety of benchmark datasets including curated data synthetically corrupted with in-distribution and out-of-distribution noise, and two real world web noise datasets. Our experiments are fully reproducible github.com/PaulAlbert31/SNCF

下载PDF全文

下载文献需遵守相关版权规定

论文标题