通过加权锤距离在二进制代码上快速搜索

论文标题

通过加权锤距离在二进制代码上快速搜索

Fast Search on Binary Codes by Weighted Hamming Distance

论文作者

Weng, Zhenyu, Zhu, Yuesheng, Liu, Ruixin

论文摘要

加权锤距离是二进制代码和二进制查询之间的相似性度量，在搜索任务中比锤击距离提供了更高的精度。但是，如何有效，准确地找到具有最小加权锤距离的$ K $二进制代码仍然是一个空旷的问题。在本文中，提出了一种快速的搜索算法，以通过加权锤距离对$ k $最近的二进制代码进行非排量搜索。通过将二进制代码用作哈希表中的直接存储量指数，搜索算法生成了一个序列，以根据每个位的权重的独立性特征来探测存储桶。此外，基于建议的搜索算法的快速搜索框架旨在解决长二进制代码的问题。具体而言，长二进制代码分为子字符串，并在其上构建了多个哈希表。然后，搜索算法根据生成的子字符串指数探测存储桶以获得候选物，并提出合并算法通过合并候选者来查找最近的二进制代码。理论分析和实验结果表明，搜索算法与其他非排便算法相比提高了搜索准确性，并且比线性扫描基线提供了更快的量质量搜索。

Weighted Hamming distance, as a similarity measure between binary codes and binary queries, provides superior accuracy in search tasks than Hamming distance. However, how to efficiently and accurately find $K$ binary codes that have the smallest weighted Hamming distance to the query remains an open issue. In this paper, a fast search algorithm is proposed to perform the non-exhaustive search for $K$ nearest binary codes by weighted Hamming distance. By using binary codes as direct bucket indices in a hash table, the search algorithm generates a sequence to probe the buckets based on the independence characteristic of the weights for each bit. Furthermore, a fast search framework based on the proposed search algorithm is designed to solve the problem of long binary codes. Specifically, long binary codes are split into substrings and multiple hash tables are built on them. Then, the search algorithm probes the buckets to obtain candidates according to the generated substring indices, and a merging algorithm is proposed to find the nearest binary codes by merging the candidates. Theoretical analysis and experimental results demonstrate that the search algorithm improves the search accuracy compared to other non-exhaustive algorithms and provides orders-of-magnitude faster search than the linear scan baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题