论文标题

英语:优化的高维knn搜索与布雷格曼距离

BrePartition: Optimized High-Dimensional kNN Search with Bregman Distances

论文作者

Song, Yang, Gu, Yu, Zhang, Rui, Yu, Ge

论文摘要

Bregman距离(也称为Bregman Diverence)广泛用于机器学习,语音识别和信号处理,而随着多媒体应用的快速发展,具有Bregman距离的KNN搜索变得越来越重要。多媒体应用程序(例如图像和视频)中的数据通常转化为数百个维度的空间。这种高维空间对带有Bregman距离的现有KNN搜索算法提出了重大挑战,该算法只能处理中等维度的数据(通常小于100)。本文解决了与Bregman距离的高维KNN搜索的紧迫问题。我们提出了一个新颖的分区滤波器框架。具体而言,我们提出了一个优化的维度分配方案,以解决几个非平凡问题。首先,从每个分区子空间中有效绑定以获得精确的KNN结果。其次,我们对优化数量的分区进行了深入分析,并制定了有效的分区策略。第三,我们为所有子空间设计一个有效的集成索引结构,以加速搜索处理。此外,我们通过精确性和效率之间的权衡将精确的解决方案扩展到近似版本。与最新算法相比,四个现实世界数据集和两个合成数据集的实验结果显示了我们方法的明显优势。

Bregman distances (also known as Bregman divergences) are widely used in machine learning, speech recognition and signal processing, and kNN searches with Bregman distances have become increasingly important with the rapid advances of multimedia applications. Data in multimedia applications such as images and videos are commonly transformed into space of hundreds of dimensions. Such high-dimensional space has posed significant challenges for existing kNN search algorithms with Bregman distances, which could only handle data of medium dimensionality (typically less than 100). This paper addresses the urgent problem of high-dimensional kNN search with Bregman distances. We propose a novel partition-filter-refinement framework. Specifically, we propose an optimized dimensionality partitioning scheme to solve several non-trivial issues. First, an effective bound from each partitioned subspace to obtain exact kNN results is derived. Second, we conduct an in-depth analysis of the optimized number of partitions and devise an effective strategy for partitioning. Third, we design an efficient integrated index structure for all the subspaces together to accelerate the search processing. Moreover, we extend our exact solution to an approximate version by a trade-off between the accuracy and efficiency. Experimental results on four real-world datasets and two synthetic datasets show the clear advantage of our method in comparison to state-of-the-art algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源