论文标题
具有对数搜索时间的语法压缩索引
Grammar-Compressed Indexes with Logarithmic Search Time
论文作者
论文摘要
令文本$ t [1..n] $为$ g $(终端和非终端)符号和尺寸$ g $(以规则右侧的长度的总和为衡量的$ g $(端子和非终端)符号)生成的唯一字符串。这样的语法,称为$ t $的语法压缩表示形式,可以使用$ g \ lg g $位编码。我们介绍了第一个使用$ o(g \ lg n)$位的语法压缩索引,并可以找到$ occ $ occ $ forters $ p [1..m] $ in time $ o(((m^2+occ)\ lg g)$。我们实施该索引,并在高度重复的文本收集中与艺术状态相比,证明其实用性。
Let a text $T[1..n]$ be the only string generated by a context-free grammar with $g$ (terminal and nonterminal) symbols, and of size $G$ (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of $T$, can be encoded using essentially $G\lg g$ bits. We introduce the first grammar-compressed index that uses $O(G\lg n)$ bits and can find the $occ$ occurrences of patterns $P[1..m]$ in time $O((m^2+occ)\lg G)$. We implement the index and demonstrate its practicality in comparison with the state of the art, on highly repetitive text collections.