论文标题
FedCorr:标签噪声校正的多阶段联合学习
FedCorr: Multi-Stage Federated Learning for Label Noise Correction
论文作者
论文摘要
联合学习(FL)是一个隐私的分布式学习范式,使客户能够共同培训全球模型。在现实世界中的FL实现中,客户数据可能具有标签噪声,并且不同的客户端可能具有巨大的标签噪声水平。尽管集中学习可以解决标签噪声的方法,但由于佛罗里达州的客户端数据集和数据隐私要求通常较小,因此在FL设置中,这种方法在FL设置中的异质标签噪声方面表现不佳。在本文中,我们提出了$ \ texttt {fedcorr} $,这是一个通用的多阶段框架,可在FL中处理异质标签噪声,而无需对本地客户端的噪声模型进行任何假设,同时仍保持客户数据隐私。特别是,(1)$ \ texttt {fedcorr} $通过利用在所有客户端独立测量的模型预测子空间的维度来动态识别嘈杂的客户,然后根据每样本样本损失确定嘈杂客户端的错误标签。为了处理数据异质性并提高训练稳定性,我们提出了一个基于估计的局部噪声水平的自适应局部近端正则化项。 (2)我们进一步对确定的清洁客户端进行全球模型,并在固定后校正其余嘈杂客户的嘈杂标签。 (3)最后,我们对所有客户进行了通常的培训,以充分利用所有本地数据。通过联合合成标签噪声和现实世界嘈杂的数据集(Clothing 1M)在CIFAR-10/100上进行的实验表明,$ \ texttt {fedCorr} $对标记噪声具有强大的功能,并且在多个噪声级别上大量均超过了现状的方法。
Federated learning (FL) is a privacy-preserving distributed learning paradigm that enables clients to jointly train a global model. In real-world FL implementations, client data could have label noise, and different clients could have vastly different label noise levels. Although there exist methods in centralized learning for tackling label noise, such methods do not perform well on heterogeneous label noise in FL settings, due to the typically smaller sizes of client datasets and data privacy requirements in FL. In this paper, we propose $\texttt{FedCorr}$, a general multi-stage framework to tackle heterogeneous label noise in FL, without making any assumptions on the noise models of local clients, while still maintaining client data privacy. In particular, (1) $\texttt{FedCorr}$ dynamically identifies noisy clients by exploiting the dimensionalities of the model prediction subspaces independently measured on all clients, and then identifies incorrect labels on noisy clients based on per-sample losses. To deal with data heterogeneity and to increase training stability, we propose an adaptive local proximal regularization term that is based on estimated local noise levels. (2) We further finetune the global model on identified clean clients and correct the noisy labels for the remaining noisy clients after finetuning. (3) Finally, we apply the usual training on all clients to make full use of all local data. Experiments conducted on CIFAR-10/100 with federated synthetic label noise, and on a real-world noisy dataset, Clothing1M, demonstrate that $\texttt{FedCorr}$ is robust to label noise and substantially outperforms the state-of-the-art methods at multiple noise levels.