论文标题

Privsyn:差异私人数据综合

PrivSyn: Differentially Private Data Synthesis

论文作者

Zhang, Zhikun, Wang, Tianhao, Li, Ninghui, Honorio, Jean, Backes, Michael, He, Shibo, Chen, Jiming, Zhang, Yang

论文摘要

在差异隐私(DP)中,一个具有挑战性的问题是生成合成数据集,以有效地捕获私人数据中的有用信息。合成数据集使任何任务都可以完成,而无需隐私问题和对现有算法的修改。在本文中,我们介绍了Privsyn,这是可以处理一般表格数据集的第一个自动合成数据生成方法(具有100个属性和域大小$> 2^{500} $)。 Privsyn由一种新方法组成,可以自动和私人识别数据中的相关性,以及一种从密集的图形模型中生成示例数据的新方法。我们广泛评估了多个数据集上的不同方法,以证明我们方法的性能。

In differential privacy (DP), a challenging problem is to generate synthetic datasets that efficiently capture the useful information in the private data. The synthetic dataset enables any task to be done without privacy concern and modification to existing algorithms. In this paper, we present PrivSyn, the first automatic synthetic data generation method that can handle general tabular datasets (with 100 attributes and domain size $>2^{500}$). PrivSyn is composed of a new method to automatically and privately identify correlations in the data, and a novel method to generate sample data from a dense graphic model. We extensively evaluate different methods on multiple datasets to demonstrate the performance of our method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源