论文标题
使用ModelMix差异私人深度学习
Differentially Private Deep Learning with ModelMix
论文作者
论文摘要
培训具有有意义/可用的差异隐私安全保证的大型神经网络是一个挑战。在本文中,我们通过在差异私有随机梯度下降(DP-SGD)中重新审查两个关键操作来解决此问题:1)迭代扰动和2)梯度剪辑。我们提出了一个通用优化框架,称为{\ em ModelMix},该框架执行中间模型状态的随机聚集。它利用训练轨迹的熵加强了复合隐私分析,并通过数量级提高了$(ε,δ)$ dp安全参数。 我们为ModelMix的公用事业保证和隐私放大提供了严格的分析。特别是,我们介绍了DP-SGD中梯度裁剪的效果的正式研究,该研究提供了有关如何选择超参数的理论指导。我们还引入了一种精致的梯度剪裁方法,与ModelMix结合使用时,可以进一步增强私人学习的隐私损失。 提出了具有重大隐私/公用事业改进的彻底实验,以支持我们的理论。我们在CIFAR10上以$ 70.4 \%$的精度通过ModelMix给定$(ε= 8,δ= 10^{ - 5})$ dp-budget训练Resnet-20网络,与相同的性能相比,$(ε= 145.8,δ= 10^{ - 5})$使用常规DP-SGD;在额外的公共低维梯度嵌入额的协助下,可以进一步将准确性提高到$ 79.1 \%$,而$(ε= 6.1,δ= 10^{ - 5})$ dp-budget与相同的性能相比,但没有$(ε= 111.2,δ= 10^{ - 5})$没有模型mix $。
Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs random aggregation of intermediate model states. It strengthens the composite privacy analysis utilizing the entropy of the training trajectory and improves the $(ε, δ)$ DP security parameters by an order of magnitude. We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix. Thorough experiments with significant privacy/utility improvement are presented to support our theory. We train a Resnet-20 network on CIFAR10 with $70.4\%$ accuracy via ModelMix given $(ε=8, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=145.8,δ=10^{-5})$ using regular DP-SGD; assisted with additional public low-dimensional gradient embedding, one can further improve the accuracy to $79.1\%$ with $(ε=6.1, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=111.2, δ=10^{-5})$ without ModelMix.