论文标题
Lernaean Hydra的返回:数据系列的实验评估近似相似性搜索
Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search
论文作者
论文摘要
数据系列是存在于众多域中的多维数据的一种特殊类型,其中相似性搜索是一个关键操作,在数据系列文献中已经进行了广泛研究。同时,多维社区研究了近似相似性搜索技术。我们提出了相似性搜索技术的分类法,该分类搜索技术可以调解这两个领域中使用的术语,我们描述了对数据系列索引技术的修改,使他们能够用质量保证来回答近似的相似性查询,并且我们进行了彻底的实验评估,以比较统一的框架下的近似相似性搜索技术,并在统一的框架下进行了综合和实际的数据列表和实际的记忆和实际上的记忆。尽管数据序列不同于通用多维矢量(通常在相邻值之间表现出相关性),但我们的结果表明,数据系列技术在数据系列和矢量类似的数据系列中回答具有强大保证和出色的经验性能的近似%相似性查询。这些技术在磁盘上操作时的最新近似技术优于最先进的技术,并且在记忆中保持竞争力。
Data series are a special type of multidimensional data present in numerous domains, where similarity search is a key operation that has been extensively studied in the data series literature. In parallel, the multidimensional community has studied approximate similarity search techniques. We propose a taxonomy of similarity search techniques that reconciles the terminology used in these two domains, we describe modifications to data series indexing techniques enabling them to answer approximate similarity queries with quality guarantees, and we conduct a thorough experimental evaluation to compare approximate similarity search techniques under a unified framework, on synthetic and real datasets in memory and on disk. Although data series differ from generic multidimensional vectors (series usually exhibit correlation between neighboring values), our results show that data series techniques answer approximate %similarity queries with strong guarantees and an excellent empirical performance, on data series and vectors alike. These techniques outperform the state-of-the-art approximate techniques for vectors when operating on disk, and remain competitive in memory.