论文标题
培训生产语言模型而不记住用户数据
Training Production Language Models without Memorizing User Data
论文作者
论文摘要
本文介绍了第一个消费者规模的下言预测(NWP)模型,该模型接受了联合学习(FL)的培训,同时利用差异私人联合平均(DP-FEDAVG)技术。先前在建造实用基础架构方面已经进行了工作,包括使用此类基础架构在移动设备上培训语言模型的可行性的工作。 (在公共语料库的模拟中)还显示了可以使用DP-FEDAVG算法训练具有用户级差异隐私的NWP模型。然而,在现实世界中,手机的培训质量的NWP模型在现实世界中的生产环境中使用DP-Fedavg进行了培训,需要应对许多挑战。例如,协调的中央服务器必须跟踪每个回合开始时可用的设备,并随机地随机示例设备,同时确保\ emph {样品的保密}等。与所有先前以Privenacy为中心的FL工作不同,我们首次意识到,我们在第一次培训了生产机构的NEARALINIS培训,以培训Neural的培训,以培训Neural的培训。对意外记忆的端到端经验测量的基础架构。
This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique. There has been prior work on building practical FL infrastructure, including work demonstrating the feasibility of training language models on mobile devices using such infrastructure. It has also been shown (in simulations on a public corpus) that it is possible to train NWP models with user-level differential privacy using the DP-FedAvg algorithm. Nevertheless, training production-quality NWP models with DP-FedAvg in a real-world production environment on a heterogeneous fleet of mobile phones requires addressing numerous challenges. For instance, the coordinating central server has to keep track of the devices available at the start of each round and sample devices uniformly at random from them, while ensuring \emph{secrecy of the sample}, etc. Unlike all prior privacy-focused FL work of which we are aware, for the first time we demonstrate the deployment of a differentially private mechanism for the training of a production neural network in FL, as well as the instrumentation of the production training infrastructure to perform an end-to-end empirical measurement of unintended memorization.