在当地差异隐私下收集高维数据的表示学习

论文标题

在当地差异隐私下收集高维数据的表示学习

Representation Learning for High-Dimensional Data Collection under Local Differential Privacy

论文作者

Mansbridge, Alex, Barbour, Gregory, Piras, Davide, Murray, Michael, Frye, Christopher, Feige, Ilya, Barber, David

论文摘要

在许多行业中，个人数据的收集已变得司空见惯。当地差异隐私（LDP）提供了一种严格的方法来保存隐私，从而使个人在本地私有化，仅允许他们的扰动基准保留其财产。因此，LDP可以针对对手和数据库管理员为个人提供可证明的隐私保证。现有的南非自务部门机制已成功地应用于低维数据，但在高维度中，隐私噪声在很大程度上破坏了数据的效用。在这项工作中，我们的贡献是两个方面：首先，通过从代表学习中调整最新技术，我们引入了一种新颖的方法来学习开展最低点的机制。这些机制在数据基础的低维流形上增加了噪声，从而克服了LDP在高维度中的噪声要求。其次，我们介绍了一种新颖的Denoising方法，用于下游模型学习。使用收集的LDP数据对性能机器学习模型进行培训是数据收集器的共同目标，而下游模型性能构成了LDP数据实用程序的代理。我们的方法极大地胜过当前最新的LDP机制。

The collection of individuals' data has become commonplace in many industries. Local differential privacy (LDP) offers a rigorous approach to preserving privacy whereby the individual privatises their data locally, allowing only their perturbed datum to leave their possession. LDP thus provides a provable privacy guarantee to the individual against both adversaries and database administrators. Existing LDP mechanisms have successfully been applied to low-dimensional data, but in high dimensions the privacy-inducing noise largely destroys the utility of the data. In this work, our contributions are two-fold: first, by adapting state-of-the-art techniques from representation learning, we introduce a novel approach to learning LDP mechanisms. These mechanisms add noise to powerful representations on the low-dimensional manifold underlying the data, thereby overcoming the prohibitive noise requirements of LDP in high dimensions. Second, we introduce a novel denoising approach for downstream model learning. The training of performant machine learning models using collected LDP data is a common goal for data collectors, and downstream model performance forms a proxy for the LDP data utility. Our approach significantly outperforms current state-of-the-art LDP mechanisms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题