论文标题

将Johnson-Lindenstrauss嵌入二重

Binarized Johnson-Lindenstrauss embeddings

论文作者

Dirksen, Sjoerd, Stollenwerk, Alexander

论文摘要

我们考虑将一组向量编码为最小数量的位,同时保留其欧几里得几何形状的信息。我们表明,可以通过应用Johnson-Lindenstrauss嵌入,然后通过将向量的每个条目与均匀的随机阈值进行比较来完成此任务。使用这种简单的结构,我们生成了两个数据集的两个编码,以便可以使用少数位操作查询欧几里得信息,以查询一对点,最多可达所需的加性误差 - 第一种情况下的欧几里得距离以及第二个情况下的内部产品和平方的欧几里得距离。在后一种情况下,每个点在接近线性的时间内进行编码。这些编码所需的位数是根据数据集的两个自然复杂性参数(其覆盖数和局部高斯复杂性)量化的,并显示为几乎最佳的。

We consider the problem of encoding a set of vectors into a minimal number of bits while preserving information on their Euclidean geometry. We show that this task can be accomplished by applying a Johnson-Lindenstrauss embedding and subsequently binarizing each vector by comparing each entry of the vector to a uniformly random threshold. Using this simple construction we produce two encodings of a dataset such that one can query Euclidean information for a pair of points using a small number of bit operations up to a desired additive error - Euclidean distances in the first case and inner products and squared Euclidean distances in the second. In the latter case, each point is encoded in near-linear time. The number of bits required for these encodings is quantified in terms of two natural complexity parameters of the dataset - its covering numbers and localized Gaussian complexity - and shown to be near-optimal.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源