大规模转移学习用于差异私人图像分类

论文标题

大规模转移学习用于差异私人图像分类

Large Scale Transfer Learning for Differentially Private Image Classification

论文作者

Mehta, Harsh, Thakurta, Abhradeep, Kurakin, Alexey, Cutkosky, Ashok

论文摘要

差异隐私（DP）为具有个别示例级别隐私的培训机器学习模型提供了正式的框架。在深度学习领域，差异化随机梯度下降（DP-SGD）已成为一种流行的私人培训算法。不幸的是，使用DP-SGD的大规模训练大型模型的计算成本大大高于非私人培训。这一事实进一步加剧了这一事实，即增加参数的数量会导致DP实用性降解较大。在这项工作中，我们放大了Imagenet数据集，并证明，与非私人案例类似，大型公共数据集中的预训练过度参数化模型可能会在私人审核模型时会带来可观的增长。此外，通过系统地比较一系列大批量尺寸的私人和非私有模型，我们发现与非私人设置相似，优化器的选择可以通过DP实质上进一步提高性能。通过将羔羊优化器与DP-SGD一起使用，我们看到了高达20美元$ $点的改进（绝对）。最后，我们表明，在完整批处理设置中，对\ emph {单步}的最后一层进行了填充，结合了极小的（接近零）的初始化，导致在[4、10] $和$δ$ umination contuction $ε\ $Δ$ = $ 10^$ umigation the Incuminative $ε\ y Minally的$ε\ y Minally the contuction $ε\ y MINCTONATY PRIVITACH预算范围内的SOTA结果均为81.7 $ \％$。

Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training. This is further exacerbated by the fact that increasing the number of parameters leads to larger degradation in utility with DP. In this work, we zoom in on the ImageNet dataset and demonstrate that, similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the model is finetuned privately. Moreover, by systematically comparing private and non-private models across a range of large batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP. By using LAMB optimizer with DP-SGD we saw improvement of up to 20$\%$ points (absolute). Finally, we show that finetuning just the last layer for a \emph{single step} in the full batch setting, combined with extremely small-scale (near-zero) initialization leads to both SOTA results of 81.7 $\%$ under a wide privacy budget range of $ε\in [4, 10]$ and $δ$ = $10^{-6}$ while minimizing the computational overhead substantially.

下载PDF全文

下载文献需遵守相关版权规定

论文标题