矩阵配置文件发疯：数据系列中的可变长度主题和不和谐发现

论文标题

矩阵配置文件发疯：数据系列中的可变长度主题和不和谐发现

Matrix Profile Goes MAD: Variable-Length Motif And Discord Discovery in Data Series

论文作者

Linardi, Michele, Zhu, Yan, Palpanas, Themis, Keogh, Eamonn

论文摘要

在过去的十五年中，数据系列主题和不和谐发现已成为两个有用且充分利用的数据系列挖掘的原始图，并应用于许多领域，包括机器人技术，昆虫学，地震学，医学和气候学。然而，最新的主题和不和谐发现工具仍然要求用户提供相对长度。但是，在某些情况下，长度的选择至关重要且无情。不幸的是，在给定范围内测试所有长度的明显蛮力解决方案在计算上是站不住脚的。在这项工作中，我们引入了一个新的框架，该框架提供了一个确切可扩展的主题和不和谐算法，该算法有效地在给定的长度范围内有效地找到了所有图案和不和谐。我们使用五个不同的实际数据集评估了我们的方法，并证明它的速度比最先进的速度快20倍。我们的结果还表明，删除用户知道正确长度的不切实际假设，通常会产生更直观和可操作的结果，否则可能会错过。（发表在数据挖掘和知识发现期刊上的论文-2020）

In the last fifteen years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provide the relative length. Yet, in several cases, the choice of length is critical and unforgiving. Unfortunately, the obvious brute-force solution, which tests all lengths within a given range, is computationally untenable. In this work, we introduce a new framework, which provides an exact and scalable motif and discord discovery algorithm that efficiently finds all motifs and discords in a given range of lengths. We evaluate our approach with five diverse real datasets, and demonstrate that it is up to 20 times faster than the state-of-the-art. Our results also show that removing the unrealistic assumption that the user knows the correct length, can often produce more intuitive and actionable results, which could have otherwise been missed. (Paper published in Data Mining and Knowledge Discovery Journal - 2020)

下载PDF全文

下载文献需遵守相关版权规定

论文标题