有效差异私人$ f_0 $线性草图

论文标题

有效差异私人$ f_0 $线性草图

Efficient Differentially Private $F_0$ Linear Sketching

论文作者

Pagh, Rasmus, Stausholm, Nina Mesing

论文摘要

线性草图的一个强大功能是，从两个数据向量的草图中，一个可以计算向量之间差异的草图。这使我们可以回答有关两个数据集之间差异的细粒度问题。在这项工作中，我们考虑如何为加权$ f_0 $（即数据集中元素的总权重）构造草图，这些元素的总权重很小，差异化且在计算上有效。 Let a weight vector $w\in(0,1]^u$ be given. For $x\in\{0,1\}^u$ we are interested in estimating $\Vert x\circ w\Vert_1$ where $\circ$ is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al.~(STOC 1998), we introduce a sketch (depending on $w$)这是在\ {0,1 \}^u $ in \ in \ {0,1 \}^in \ in \ {0,1 \}^τ$中绘制$ x \ in \ {0,1 \}^τ$的矢量$ x \的线性。我们表明，对于$ 0 <β<1 $和$ \ varepsilon = o（1）$的每一个选择都存在$ p <1/2 $和分布$ \ nathcal {h} $ size $ $τ= o（\ log log^2（u log^2（u）\ varepsilon^v varepsilon^{ - 2} {-2} $的线性草图$（\ log log^2（u）给定$ h \ sim \ sim \ Mathcal {h} $和噪声向量$φ$，给定$ hx +φ$我们可以计算$ \ vert x \ circ w \ circ w \ vert_1 $在因子$ 1 \pmβ$，pmβ$，pmβ$，加上添加错误$ o（加上添加错误$ o（\ logepsilon^概率）中，这是准确的$ 1-1/u $，2）对于每$ h \ sim \ mathcal {h} $，$ hx +φ$ is $ \ varepsilon $ -differtial private private private in $ $φ$中的随机性。特殊情况$ w =（1，\ dots，1）$是未加权的$ f_0 $。我们的结果既提高了未加权$ F_0 $估算的现有方法的效率，并扩展到加权概括。我们还提供了分布式流的实现，以估计两个输入流之间的联合大小。

A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work, we consider how to construct sketches for weighted $F_0$, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector $w\in(0,1]^u$ be given. For $x\in\{0,1\}^u$ we are interested in estimating $\Vert x\circ w\Vert_1$ where $\circ$ is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al.~(STOC 1998), we introduce a sketch (depending on $w$) that is linear over GF(2), mapping a vector $x\in \{0,1\}^u$ to $Hx\in\{0,1\}^τ$ for a matrix $H$ sampled from a suitable distribution $\mathcal{H}$. Differential privacy is achieved by using randomized response, flipping each bit of $Hx$ with probability $p<1/2$. We show that for every choice of $0<β< 1$ and $\varepsilon=O(1)$ there exists $p<1/2$ and a distribution $\mathcal{H}$ of linear sketches of size $τ= O(\log^2(u)\varepsilon^{-2}β^{-2})$ such that: 1) For random $H\sim\mathcal{H}$ and noise vector $φ$, given $Hx + φ$ we can compute an estimate of $\Vert x\circ w\Vert_1$ that is accurate within a factor $1\pmβ$, plus additive error $O(\log(u)\varepsilon^{-2}β^{-2})$, with probability $1-1/u$, and 2) For every $H\sim\mathcal{H}$, $Hx + φ$ is $\varepsilon$-differentially private over the randomness in $φ$. The special case $w=(1,\dots,1)$ is unweighted $F_0$. Our results both improve the efficiency of existing methods for unweighted $F_0$ estimating and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.

下载PDF全文

下载文献需遵守相关版权规定

论文标题