论文标题
有效差异私人$ f_0 $线性草图
Efficient Differentially Private $F_0$ Linear Sketching
论文作者
论文摘要
线性草图的一个强大功能是,从两个数据向量的草图中,一个可以计算向量之间差异的草图。这使我们可以回答有关两个数据集之间差异的细粒度问题。在这项工作中,我们考虑如何为加权$ f_0 $(即数据集中元素的总权重)构造草图,这些元素的总权重很小,差异化且在计算上有效。 Let a weight vector $w\in(0,1]^u$ be given. For $x\in\{0,1\}^u$ we are interested in estimating $\Vert x\circ w\Vert_1$ where $\circ$ is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al.~(STOC 1998), we introduce a sketch (depending on $w$)这是在\ {0,1 \}^u $ in \ in \ {0,1 \}^in \ in \ {0,1 \}^τ$中绘制$ x \ in \ {0,1 \}^τ$的矢量$ x \的线性。我们表明,对于$ 0 <β<1 $和$ \ varepsilon = o(1)$的每一个选择都存在$ p <1/2 $和分布$ \ nathcal {h} $ size $ $τ= o(\ log log^2(u log^2(u)\ varepsilon^v varepsilon^{ - 2} {-2} $的线性草图$(\ log log^2(u)给定$ h \ sim \ sim \ Mathcal {h} $和噪声向量$φ$,给定$ hx +φ$我们可以计算$ \ vert x \ circ w \ circ w \ vert_1 $在因子$ 1 \pmβ$,pmβ$,pmβ$,加上添加错误$ o(加上添加错误$ o(\ logepsilon^概率)中,这是准确的$ 1-1/u $,2)对于每$ h \ sim \ mathcal {h} $,$ hx +φ$ is $ \ varepsilon $ -differtial private private private in $ $φ$中的随机性。特殊情况$ w =(1,\ dots,1)$是未加权的$ f_0 $。我们的结果既提高了未加权$ F_0 $估算的现有方法的效率,并扩展到加权概括。我们还提供了分布式流的实现,以估计两个输入流之间的联合大小。
A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work, we consider how to construct sketches for weighted $F_0$, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector $w\in(0,1]^u$ be given. For $x\in\{0,1\}^u$ we are interested in estimating $\Vert x\circ w\Vert_1$ where $\circ$ is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al.~(STOC 1998), we introduce a sketch (depending on $w$) that is linear over GF(2), mapping a vector $x\in \{0,1\}^u$ to $Hx\in\{0,1\}^τ$ for a matrix $H$ sampled from a suitable distribution $\mathcal{H}$. Differential privacy is achieved by using randomized response, flipping each bit of $Hx$ with probability $p<1/2$. We show that for every choice of $0<β< 1$ and $\varepsilon=O(1)$ there exists $p<1/2$ and a distribution $\mathcal{H}$ of linear sketches of size $τ= O(\log^2(u)\varepsilon^{-2}β^{-2})$ such that: 1) For random $H\sim\mathcal{H}$ and noise vector $φ$, given $Hx + φ$ we can compute an estimate of $\Vert x\circ w\Vert_1$ that is accurate within a factor $1\pmβ$, plus additive error $O(\log(u)\varepsilon^{-2}β^{-2})$, with probability $1-1/u$, and 2) For every $H\sim\mathcal{H}$, $Hx + φ$ is $\varepsilon$-differentially private over the randomness in $φ$. The special case $w=(1,\dots,1)$ is unweighted $F_0$. Our results both improve the efficiency of existing methods for unweighted $F_0$ estimating and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.