参数或隐私：过度参数和会员推理之间的可证明的权衡

论文标题

参数或隐私：过度参数和会员推理之间的可证明的权衡

Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

论文作者

Tan, Jasper, Mason, Blake, Javadi, Hamid, Baraniuk, Richard G.

论文摘要

现代机器学习中的一个令人惊讶的现象是，即使对训练以记住训练数据（训练数据的零错误），高度过度参数化模型概括良好的模型的能力也是如此。这导致了越来越多地参数化模型的军备竞赛（C.F.，深度学习）。在本文中，我们研究了过度参数化的无人用的隐藏成本：过度参数化模型可能更容易受到隐私攻击的影响，尤其是预测用于培训模型的（潜在敏感的）例子的成员推理攻击。通过理论上证明在高斯数据设置中过度参数化的线性回归模型，我们可以显着扩展有关此问题的相对较少的经验结果，表明会员推理脆弱性随参数数量而增加。此外，一系列经验研究表明，更复杂的非线性模型表现出相同的行为。最后，我们将分析扩展到脊定制的线性回归，并显示在高斯数据设置中，增加正则化还增加了过度参数化制度中的成员推理脆弱性。

A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models may be more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model in the Gaussian data setting that membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we extend our analysis towards ridge-regularized linear regression and show in the Gaussian data setting that increased regularization also increases membership inference vulnerability in the overparameterized regime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题