论文标题
可扩展的协作学习通过表示共享
Scalable Collaborative Learning via Representation Sharing
论文作者
论文摘要
保护隐私的机器学习已成为多方人工智能的关键难题。联合学习(FL)和分裂学习(SL)是两个框架,可以在保持数据私有(在设备上)的同时协作学习。在FL中,每个数据持有人在本地训练模型,并将其释放到中央服务器以进行聚合。在SL中,客户必须将单个切割器激活(粉碎的数据)释放到服务器,并等待其响应(在推理和返回传播期间)。尽管在多种设置中相关,但这两个方案都具有高度的通信成本,依靠服务器级计算算法,并且不允许进行可调的协作水平。在这项工作中,我们提出了一种用于隐私机器学习的新颖方法,客户通过在线知识蒸馏使用对比度损失(对比度W.R.T.标签)进行合作。目的是确保参与者在相似类中学习类似的功能,而无需共享输入数据。为此,每个客户端都将相似标签的最后一个隐藏层激活释放到仅充当继电器的中央服务器(即不参与模型的培训或聚集)。然后,客户下载用户集合的这些最后一层激活(特征表示),并使用对比目标在其个人模型中提取知识。对于跨设备应用程序(即小型本地数据集和有限的计算能力),与独立学习和其他联合知识蒸馏(FD)方案相比,这种方法增加了模型的实用性,它具有有效的沟通效率,并且可扩展到客户端的数量。从理论上讲,我们证明了我们的框架已得到充分证明,并且使用不同的模型体系结构在各种数据集上对标准FD和FL进行了基准测试。
Privacy-preserving machine learning has become a key conundrum for multi-party artificial intelligence. Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device). In FL, each data holder trains a model locally and releases it to a central server for aggregation. In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation). While relevant in several settings, both of these schemes have a high communication cost, rely on server-level computation algorithms and do not allow for tunable levels of collaboration. In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss (contrastive w.r.t. the labels). The goal is to ensure that the participants learn similar features on similar classes without sharing their input data. To do so, each client releases averaged last hidden layer activations of similar labels to a central server that only acts as a relay (i.e., is not involved in the training or aggregation of the models). Then, the clients download these last layer activations (feature representations) of the ensemble of users and distill their knowledge in their personal model using a contrastive objective. For cross-device applications (i.e., small local datasets and limited computational capacity), this approach increases the utility of the models compared to independent learning and other federated knowledge distillation (FD) schemes, is communication efficient and is scalable with the number of clients. We prove theoretically that our framework is well-posed, and we benchmark its performance against standard FD and FL on various datasets using different model architectures.