论文标题

跑步压缩字符串上的子字符串复杂性

Substring Complexities on Run-length Compressed Strings

论文作者

Kawamoto, Akiyoshi, I, Tomohiro

论文摘要

令$ s_ {t}(k)$表示字符串$ t $中长度$ k $的不同长的子字符集,然后$ k $ -th的子字符串复杂性由其基数$ | s_ {t}(k)| $定义。最近,$δ= \ max \ {| s_ {t}(k)| / k:k \ ge 1 \} $被证明是高度重复字符串的良好可压缩性度量。在本文中,给定$ t $ n $的$ t $以$ r $的运行长度压缩形式,我们表明$δ$可以以$ \ sathit {c} _ {\ mathsf {sortsf {stort}}}(r,n)$ time and $ o(r)$(r)$ space,其中$ \ mathit { o(\ min(r \ lg \ lg r,r \ lg_ {r} n))$是对$ r $ $ o(\ lg n)$ - $ o(r)$ o(r)$ space排序的时间复杂性,其中具有单词size size $ω(\ lg lg n)$。

Let $S_{T}(k)$ denote the set of distinct substrings of length $k$ in a string $T$, then the $k$-th substring complexity is defined by its cardinality $|S_{T}(k)|$. Recently, $δ= \max \{ |S_{T}(k)| / k : k \ge 1 \}$ is shown to be a good compressibility measure of highly-repetitive strings. In this paper, given $T$ of length $n$ in the run-length compressed form of size $r$, we show that $δ$ can be computed in $\mathit{C}_{\mathsf{sort}}(r, n)$ time and $O(r)$ space, where $\mathit{C}_{\mathsf{sort}}(r, n) = O(\min (r \lg\lg r, r \lg_{r} n))$ is the time complexity for sorting $r$ $O(\lg n)$-bit integers in $O(r)$ space in the Word-RAM model with word size $Ω(\lg n)$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源