论文标题

通过识别记录相关性的同时确保隐私来促进联合基因组数据分析

Facilitating Federated Genomic Data Analysis by Identifying Record Correlations while Ensuring Privacy

论文作者

Dervishi, Leonard, Wang, Xinyue, Li, Wentao, Halimi, Anisa, Vaidya, Jaideep, Jiang, Xiaoqian, Ayday, Erman

论文摘要

随着测序成本的降低和计算设备的普遍性,基因组数据收集不断增长。但是,数据收集是高度分散的,并且数据仍在不同的存储库中孤立。分析所有这些数据对于基因组学研究将是有变革性的。但是,数据很敏感,因此不能轻易集中。此外,数据中可能存在相关性(如果未检测到)可能会影响分析。在本文中,我们迈出的第一步以隐私的方式识别多个数据存储库之间的相关记录。提出的基于随机改组,合成记录产生和当地差异隐私的框架可以取决于准确性和计算效率。对来自OpenSNP数据集的实际基因组数据的广泛评估表明,所提出的解决方案是有效的。

With the reduction of sequencing costs and the pervasiveness of computing devices, genomic data collection is continually growing. However, data collection is highly fragmented and the data is still siloed across different repositories. Analyzing all of this data would be transformative for genomics research. However, the data is sensitive, and therefore cannot be easily centralized. Furthermore, there may be correlations in the data, which if not detected, can impact the analysis. In this paper, we take the first step towards identifying correlated records across multiple data repositories in a privacy-preserving manner. The proposed framework, based on random shuffling, synthetic record generation, and local differential privacy, allows a trade-off of accuracy and computational efficiency. An extensive evaluation on real genomic data from the OpenSNP dataset shows that the proposed solution is efficient and effective.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源