论文标题

基于$ q $ gram出现的距离的快速和线性时间字符串匹配算法

Fast and linear-time string matching algorithms based on the distances of $q$-gram occurrences

论文作者

Kobayashi, Satoshi, Hendrian, Diptarama, Yoshinaka, Ryo, Shinohara, Ayumi

论文摘要

鉴于文本$ t $ t $长度$ n $和图案$ p $长度$ m $,弦匹配问题是找到$ t $中所有出现$ p $的任务。在这项研究中,我们提出了一种算法,该算法在$ O((n + m)Q)$((n + m)Q)$时间中解决了这一问题,考虑到$ p $中包含的两个相同$ q $ - 克的相邻出现之间的距离。我们还建议对IT的理论改进,以$ O(N + M)$时间运行,尽管在实践中不一定要快得多。我们比较了我们和现有算法在各种真实和人工数据集上的执行时间,例如英语文本,基因组序列和斐波那契字符串。实验结果表明,在许多情况下,我们的算法与最先进的算法一样快,尤其是在文本中经常出现模式时。

Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the string matching problem is a task to find all occurrences of $P$ in $T$. In this study, we propose an algorithm that solves this problem in $O((n + m)q)$ time considering the distance between two adjacent occurrences of the same $q$-gram contained in $P$. We also propose a theoretical improvement of it which runs in $O(n + m)$ time, though it is not necessarily faster in practice. We compare the execution times of our and existing algorithms on various kinds of real and artificial datasets such as an English text, a genome sequence and a Fibonacci string. The experimental results show that our algorithm is as fast as the state-of-the-art algorithms in many cases, particularly when a pattern frequently appears in a text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源