论文标题
在差异隐私下发布随机方法的信息理论方法
An Information Theoretic approach to Post Randomization Methods under Differential Privacy
论文作者
论文摘要
后随机方法(PRAM)是分类和连续数据的最流行的披露限制技术之一。在分类情况下,给定一个随机矩阵$ m $和指定变量,属于类别$ i $的个人被更改为类别$ j $,带有概率$ m_ {i,j} $。选择随机矩阵$ m $的每种方法都必须在两个desiderata之间进行平衡:1)从原始数据中保留尽可能多的统计信息; 2)保证个人在数据集中的隐私。这种权衡通常已被证明是非常具有挑战性的解决方案。在这项工作中,我们使用计算机科学文献中的最新工具,并建议选择$ m $作为解决受限最大化问题的解决方案。具体而言,选择$ m $作为解决受约束最大化问题的解决方案,在这种情况下,我们最大程度地提高了原始数据和转换数据之间的相互信息,鉴于转换满足差异隐私的概念的约束。对于一般的分类模型,它显示了该最大化问题如何减少到凸线性编程,因此可以通过已知的优化算法解决。
Post Randomization Methods (PRAM) are among the most popular disclosure limitation techniques for both categorical and continuous data. In the categorical case, given a stochastic matrix $M$ and a specified variable, an individual belonging to category $i$ is changed to category $j$ with probability $M_{i,j}$. Every approach to choose the randomization matrix $M$ has to balance between two desiderata: 1) preserving as much statistical information from the raw data as possible; 2) guaranteeing the privacy of individuals in the dataset. This trade-off has generally been shown to be very challenging to solve. In this work, we use recent tools from the computer science literature and propose to choose $M$ as the solution of a constrained maximization problems. Specifically, $M$ is chosen as the solution of a constrained maximization problem, where we maximize the Mutual Information between raw and transformed data, given the constraint that the transformation satisfies the notion of Differential Privacy. For the general Categorical model, it is shown how this maximization problem reduces to a convex linear programming and can be therefore solved with known optimization algorithms.