论文标题

通过动态分布校准来应对实例依赖性标签噪声

Tackling Instance-Dependent Label Noise with Dynamic Distribution Calibration

论文作者

Zhang, Manyi, Ren, Yuxin, Wang, Zihao, Yuan, Chun

论文摘要

依赖实例的标签噪声是现实的,但具有挑战性,标签腐败过程直接取决于实例。它导致训练和测试数据的分布之间发生严重的分布变化,这会损害训练有素的模型的概括。先前的工作竭尽全力解决这个问题。不幸的是,这些作品始终在没有理论保证的情况下高度依赖强有力的假设或保持启发式。在本文中,为了解决实例依赖性标签噪声的学习分布变化,采用了动态分布校准策略。具体而言,我们假设在训练数据被标签噪声损坏之前,每个类都符合功能级别的多元高斯分布。标签噪声会产生异常值以移动高斯分布。在训练期间,为了校准移动的分布,我们分别根据多元高斯分布的平均值和协方差提出了两种方法。基于均值的方法以递归减少尺寸的方式进行稳健平均估计的工作,从理论上讲,这可以保证针对标签噪声训练高质量的模型。基于协方差的方法以分布干扰方式工作,在实验上验证以提高模型鲁棒性。我们证明了我们在具有合成标签噪声和现实世界未知噪声的数据集上方法的实用性和有效性。

Instance-dependent label noise is realistic but rather challenging, where the label-corruption process depends on instances directly. It causes a severe distribution shift between the distributions of training and test data, which impairs the generalization of trained models. Prior works put great effort into tackling the issue. Unfortunately, these works always highly rely on strong assumptions or remain heuristic without theoretical guarantees. In this paper, to address the distribution shift in learning with instance-dependent label noise, a dynamic distribution-calibration strategy is adopted. Specifically, we hypothesize that, before training data are corrupted by label noise, each class conforms to a multivariate Gaussian distribution at the feature level. Label noise produces outliers to shift the Gaussian distribution. During training, to calibrate the shifted distribution, we propose two methods based on the mean and covariance of multivariate Gaussian distribution respectively. The mean-based method works in a recursive dimension-reduction manner for robust mean estimation, which is theoretically guaranteed to train a high-quality model against label noise. The covariance-based method works in a distribution disturbance manner, which is experimentally verified to improve the model robustness. We demonstrate the utility and effectiveness of our methods on datasets with synthetic label noise and real-world unknown noise.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源