经验教训：防御财产推理攻击

论文标题

经验教训：防御财产推理攻击

Lessons Learned: Defending Against Property Inference Attacks

论文作者

Stock, Joshua, Wettlaufer, Jens, Demmler, Daniel, Federrath, Hannes

论文摘要

这项工作调查并评估了针对财产推理攻击（PIAS）的多种防御策略，这是针对机器学习模型的隐私攻击。鉴于训练有素的机器学习模型，PIAS旨在提取其基本培训数据的统计特性，例如，在医疗培训数据集中揭示了男女比例。尽管对于其他隐私攻击（例如会员推论），已经发表了许多有关国防机制的研究，但这是针对PIAS捍卫PIA的第一项工作。为了制定针对白盒PIA的通用缓解策略，我们提出了新的方法属性。对属性学习的广泛实验表明，虽然在为特定对手辩护目标模型时非常有效，但属性却无法概括，即防止整个PIAS。为了调查此限制的原因，我们用可解释的AI工具石灰提出了实验结果。他们展示了具有相同客观的目标对手的最新属性推理对手如何关注目标模型的不同部分。我们通过一项后续实验进一步详细介绍了这一点，在该实验中，我们使用可视化技术T-SNE展示了机器学习模型中如何表现出严重的统计训练数据性能。基于这一点，我们提出了这样的猜想，即训练后的培训技术（如属性遗产）可能不足以提供对PIA的理想的通用保护。作为替代方案，我们研究了更简单的训练数据预处理方法的影响，例如将高斯噪声添加到训练数据集对PIAS成功率的图像中。我们在讨论不同的防御方法的讨论中总结了经验教训，并为将来的工作提供了指导。

This work investigates and evaluates multiple defense strategies against property inference attacks (PIAs), a privacy attack against machine learning models. Given a trained machine learning model, PIAs aim to extract statistical properties of its underlying training data, e.g., reveal the ratio of men and women in a medical training data set. While for other privacy attacks like membership inference, a lot of research on defense mechanisms has been published, this is the first work focusing on defending against PIAs. With the primary goal of developing a generic mitigation strategy against white-box PIAs, we propose the novel approach property unlearning. Extensive experiments with property unlearning show that while it is very effective when defending target models against specific adversaries, property unlearning is not able to generalize, i.e., protect against a whole class of PIAs. To investigate the reasons behind this limitation, we present the results of experiments with the explainable AI tool LIME. They show how state-of-the-art property inference adversaries with the same objective focus on different parts of the target model. We further elaborate on this with a follow-up experiment, in which we use the visualization technique t-SNE to exhibit how severely statistical training data properties are manifested in machine learning models. Based on this, we develop the conjecture that post-training techniques like property unlearning might not suffice to provide the desirable generic protection against PIAs. As an alternative, we investigate the effects of simpler training data preprocessing methods like adding Gaussian noise to images of a training data set on the success rate of PIAs. We conclude with a discussion of the different defense approaches, summarize the lessons learned and provide directions for future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题