论文标题

通过知识蒸馏,本地差异私人分布的深度学习

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation

论文作者

Zhuang, Di, Li, Mingchen, Chang, J. Morris

论文摘要

深度学习通常需要大量数据。在实际应用程序(例如医疗保健应用程序)中,单个组织(例如医院)收集的数据通常受到限制,并且大多数大量和多样化的数据经常在多个组织中隔离。因此,它激发了研究人员进行分布式深度学习,数据用户希望使用跨多个不同数据所有者隔离的数据来构建DL模型。但是,由于数据的敏感性,这可能会导致严重的隐私问题,因此数据所有者将犹豫和不愿意参加。我们建议通过当地的差异隐私和知识蒸馏来使用LDP-DL,这是一个保护隐私的分布深度学习框架,每个数据所有者都使用自己的(本地)私人数据集学习教师模型,并且数据用户学习了学生模型以模仿教师模型合奏的输出。在实验评估中,使用三个流行的深度学习基准数据集(即CIFAR10,MNIST,MNIST和FashionMnist)进行了我们提出的方法(即DP-SGD),DP-SGD,PATE和DP-FL进行了全面比较。实验结果表明,在隐私预算和模型准确性方面,LDP-DL始终优于其他竞争对手。

Deep learning often requires a large amount of data. In real-world applications, e.g., healthcare applications, the data collected by a single organization (e.g., hospital) is often limited, and the majority of massive and diverse data is often segregated across multiple organizations. As such, it motivates the researchers to conduct distributed deep learning, where the data user would like to build DL models using the data segregated across multiple different data owners. However, this could lead to severe privacy concerns due to the sensitive nature of the data, thus the data owners would be hesitant and reluctant to participate. We propose LDP-DL, a privacy-preserving distributed deep learning framework via local differential privacy and knowledge distillation, where each data owner learns a teacher model using its own (local) private dataset, and the data user learns a student model to mimic the output of the ensemble of the teacher models. In the experimental evaluation, a comprehensive comparison has been made among our proposed approach (i.e., LDP-DL), DP-SGD, PATE and DP-FL, using three popular deep learning benchmark datasets (i.e., CIFAR10, MNIST and FashionMNIST). The experimental results show that LDP-DL consistently outperforms the other competitors in terms of privacy budget and model accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源