论文标题
在当地差异隐私下收集高维数据的表示学习
Representation Learning for High-Dimensional Data Collection under Local Differential Privacy
论文作者
论文摘要
在许多行业中,个人数据的收集已变得司空见惯。当地差异隐私(LDP)提供了一种严格的方法来保存隐私,从而使个人在本地私有化,仅允许他们的扰动基准保留其财产。因此,LDP可以针对对手和数据库管理员为个人提供可证明的隐私保证。现有的南非自务部门机制已成功地应用于低维数据,但在高维度中,隐私噪声在很大程度上破坏了数据的效用。在这项工作中,我们的贡献是两个方面:首先,通过从代表学习中调整最新技术,我们引入了一种新颖的方法来学习开展最低点的机制。这些机制在数据基础的低维流形上增加了噪声,从而克服了LDP在高维度中的噪声要求。其次,我们介绍了一种新颖的Denoising方法,用于下游模型学习。使用收集的LDP数据对性能机器学习模型进行培训是数据收集器的共同目标,而下游模型性能构成了LDP数据实用程序的代理。我们的方法极大地胜过当前最新的LDP机制。
The collection of individuals' data has become commonplace in many industries. Local differential privacy (LDP) offers a rigorous approach to preserving privacy whereby the individual privatises their data locally, allowing only their perturbed datum to leave their possession. LDP thus provides a provable privacy guarantee to the individual against both adversaries and database administrators. Existing LDP mechanisms have successfully been applied to low-dimensional data, but in high dimensions the privacy-inducing noise largely destroys the utility of the data. In this work, our contributions are two-fold: first, by adapting state-of-the-art techniques from representation learning, we introduce a novel approach to learning LDP mechanisms. These mechanisms add noise to powerful representations on the low-dimensional manifold underlying the data, thereby overcoming the prohibitive noise requirements of LDP in high dimensions. Second, we introduce a novel denoising approach for downstream model learning. The training of performant machine learning models using collected LDP data is a common goal for data collectors, and downstream model performance forms a proxy for the LDP data utility. Our approach significantly outperforms current state-of-the-art LDP mechanisms.