论文标题
分层普查数据的差异隐私数据:一种优化方法
Differential Privacy of Hierarchical Census Data: An Optimization Approach
论文作者
论文摘要
本文是由有兴趣释放有关大人群的总体社会经济数据的普查局的应用所激发的,而没有揭示有关任何个人的敏感信息。发布的信息可以是独自生活的个人数量,他们拥有的汽车数量或薪金括号。最近的事件已经确定了这些组织面临的一些隐私挑战。为了解决这些问题,本文提出了一种新型的差异性私人机制,用于释放个人的分层计数。这些计数报告在多个粒度(例如国家,州和县级)上,必须在各个层面上保持一致。该机制的核心是一个优化模型,该模型重新分布了为了实现差异隐私而引入的噪声,以满足层次级别之间的一致性约束。本文的关键技术贡献表明,可以通过利用其成本功能的结构来解决此优化问题。非常大的真实数据集的实验结果表明,所提出的机制在计算效率和准确性方面提供了多达两个数量级的改进,相对于其他最先进的技术。
This paper is motivated by applications of a Census Bureau interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual. The released information can be the number of individuals living alone, the number of cars they own, or their salary brackets. Recent events have identified some of the privacy challenges faced by these organizations. To address them, this paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals. The counts are reported at multiple granularities (e.g., the national, state, and county levels) and must be consistent across all levels. The core of the mechanism is an optimization model that redistributes the noise introduced to achieve differential privacy in order to meet the consistency constraints between the hierarchical levels. The key technical contribution of the paper shows that this optimization problem can be solved in polynomial time by exploiting the structure of its cost functions. Experimental results on very large, real datasets show that the proposed mechanism provides improvements of up to two orders of magnitude in terms of computational efficiency and accuracy with respect to other state-of-the-art techniques.