论文标题

使用概率数据结构在流数据中进行局部差异私人模糊计数

Local Differentially Private Fuzzy Counting in Stream Data using Probabilistic Data Structure

论文作者

Vatsalan, Dinusha, Bhaskar, Raghav, Kaafar, Mohamed Ali

论文摘要

流媒体数据中项目计数的隐私性估计可在几种现实世界中找到应用程序,包括单词自动纠正和流量管理应用程序。拉录和苹果的伯爵摩sket草(CMS)算法的最新作品提出了使用概率数据结构(例如计数bloom滤镜和cms)进行大量数据中计数估算的隐私机制。但是,这些现有方法在为实时流数据应用程序提供声音解决方案时缺乏。在这项工作中,我们提出了一种新颖的(本地)差异性私人机制,该机制为流式数据计数估计问题提供了高实用性,即相似甚至较低的隐私预算,同时提供:a)模糊计数以报告相关或类似项目的计数(例如,计算键入错误和数据变量),以及b)提高Query Query Querying Query Query Query的响应效率,以减少实时时间的响应时间。我们为隐私和公用事业提供正式证明,并使用真实和合成的英语单词数据集对我们的算法进行广泛的实验评估,以确切和模糊计数。我们的隐私保留机制在较低的查询时间方面大大优于先前的工作,在相似或较低的隐私保证下,以沟通费用为开销,在相似或较低的隐私保证下的效用(准确性估算的准确性)。

Privacy-preserving estimation of counts of items in streaming data finds applications in several real-world scenarios including word auto-correction and traffic management applications. Recent works of RAPPOR and Apple's count-mean sketch (CMS) algorithm propose privacy preserving mechanisms for count estimation in large volumes of data using probabilistic data structures like counting Bloom filter and CMS. However, these existing methods fall short in providing a sound solution for real-time streaming data applications. In this work, we propose a novel (local) Differentially private mechanism that provides high utility for the streaming data count estimation problem with similar or even lower privacy budgets while providing: a) fuzzy counting to report counts of related or similar items (for instance to account for typing errors and data variations), and b) improved querying efficiency to reduce the response time for real-time querying of counts. We provide formal proofs for privacy and utility guarantees and present extensive experimental evaluation of our algorithm using real and synthetic English words datasets for both the exact and fuzzy counting scenarios. Our privacy preserving mechanism substantially outperforms the prior work in terms of lower querying time, significantly higher utility (accuracy of count estimation) under similar or lower privacy guarantees, at the cost of communication overhead.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源