论文标题
FEDCVT:通过跨视图培训的半监督垂直联合学习
FedCVT: Semi-supervised Vertical Federated Learning with Cross-view Training
论文作者
论文摘要
联合学习使多方可以协作建立机器学习模型,而无需公开数据。特别是,垂直联合学习(VFL)使参与方能够基于对齐样品的分布式特征构建联合机器学习模型。但是,VFL要求各方共享足够数量的对齐样本。实际上,一组对齐的样本可能很小,而大多数未连接的数据都未使用。在本文中,我们提出了联合跨视图培训(FedCVT),这是一种半监督的学习方法,可改善使用有限的对齐样品的VFL模型的性能。更具体地说,FedCVT估算了缺失功能的表示形式,可以预测未标记样本的伪标签以扩展训练集,并根据扩展的训练集的不同观点共同培训三个分类器,以提高VFL模型的性能。 FedCVT不需要各方共享其原始数据和模型参数,从而保留数据隐私。我们在整个NUS,车辆和CIFAR10数据集上进行实验。实验结果表明,FedCVT明显优于仅利用对齐样品的Vanilla VFL。最后,我们进行消融研究,以研究FedCVT对FedCVT性能的每个组成部分的贡献。代码可从https://github.com/yankang18/fedcvt获得
Federated learning allows multiple parties to build machine learning models collaboratively without exposing data. In particular, vertical federated learning (VFL) enables participating parties to build a joint machine learning model based on distributed features of aligned samples. However, VFL requires all parties to share a sufficient amount of aligned samples. In reality, the set of aligned samples may be small, leaving the majority of the non-aligned data unused. In this article, we propose Federated Cross-view Training (FedCVT), a semi-supervised learning approach that improves the performance of the VFL model with limited aligned samples. More specifically, FedCVT estimates representations for missing features, predicts pseudo-labels for unlabeled samples to expand the training set, and trains three classifiers jointly based on different views of the expanded training set to improve the VFL model's performance. FedCVT does not require parties to share their original data and model parameters, thus preserving data privacy. We conduct experiments on NUS-WIDE, Vehicle, and CIFAR10 datasets. The experimental results demonstrate that FedCVT significantly outperforms vanilla VFL that only utilizes aligned samples. Finally, we perform ablation studies to investigate the contribution of each component of FedCVT to the performance of FedCVT. Code is available at https://github.com/yankang18/FedCVT