使用ModelMix差异私人深度学习

论文标题

使用ModelMix差异私人深度学习

Differentially Private Deep Learning with ModelMix

论文作者

Xiao, Hanshen, Wan, Jun, Devadas, Srinivas

论文摘要

培训具有有意义/可用的差异隐私安全保证的大型神经网络是一个挑战。在本文中，我们通过在差异私有随机梯度下降（DP-SGD）中重新审查两个关键操作来解决此问题：1）迭代扰动和2）梯度剪辑。我们提出了一个通用优化框架，称为{\ em ModelMix}，该框架执行中间模型状态的随机聚集。它利用训练轨迹的熵加强了复合隐私分析，并通过数量级提高了$（ε，δ）$ dp安全参数。我们为ModelMix的公用事业保证和隐私放大提供了严格的分析。特别是，我们介绍了DP-SGD中梯度裁剪的效果的正式研究，该研究提供了有关如何选择超参数的理论指导。我们还引入了一种精致的梯度剪裁方法，与ModelMix结合使用时，可以进一步增强私人学习的隐私损失。提出了具有重大隐私/公用事业改进的彻底实验，以支持我们的理论。我们在CIFAR10上以$ 70.4 \％$的精度通过ModelMix给定$（ε= 8，δ= 10^{ - 5}）$ dp-budget训练Resnet-20网络，与相同的性能相比，$（ε= 145.8，δ= 10^{ - 5}）$使用常规DP-SGD;在额外的公共低维梯度嵌入额的协助下，可以进一步将准确性提高到$ 79.1 \％$，而$（ε= 6.1，δ= 10^{ - 5}）$ dp-budget与相同的性能相比，但没有$（ε= 111.2，δ= 10^{ - 5}）$没有模型mix $。

Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs random aggregation of intermediate model states. It strengthens the composite privacy analysis utilizing the entropy of the training trajectory and improves the $(ε, δ)$ DP security parameters by an order of magnitude. We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix. Thorough experiments with significant privacy/utility improvement are presented to support our theory. We train a Resnet-20 network on CIFAR10 with $70.4\%$ accuracy via ModelMix given $(ε=8, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=145.8,δ=10^{-5})$ using regular DP-SGD; assisted with additional public low-dimensional gradient embedding, one can further improve the accuracy to $79.1\%$ with $(ε=6.1, δ=10^{-5})$ DP-budget, compared to the same performance but with $(ε=111.2, δ=10^{-5})$ without ModelMix.

下载PDF全文

下载文献需遵守相关版权规定

论文标题