论文标题

基于聚合物的数据存储的质量错误校正代码

Mass Error-Correction Codes for Polymer-Based Data Storage

论文作者

Gabrys, Ryan, Pattabiraman, Srilakshmi, Milenkovic, Olgica

论文摘要

我们考虑纠正二进制聚合物字符串编码的信息中质量读数错误的问题。我们的工作基于使用组合多组[Acharya等,2015]以及[Pattabiraman等,2019]中提出的独特的字符串重建框架的弦线重建问题的结果。基于二进制聚合物的数据存储系统[Laure等,2016]通过设计两个明显不同质量的分子来代表符号$ \ {0,1 \} $并通过噪声串联质谱法执行读数。串联质谱仪片段片段要读取为较短的子字符串,仅报告其质量,通常由于不精确的电离而导致错误。用组成多组对碎片化过程进行建模,可以通过使用加泰罗尼亚路径的衍生物来设计能够渐近的最佳代码,并校正单个质量误差[Pattabiraman等,2019]。然而,目前尚无多个质量错误校正的解决方案。我们的工作通过描述使用多项式分解方法来解决收费公路问题的第一个多错误校正代码[Skiena等,1990]以及[Acharya等,2015]中描述的相关分解。在相应的多项式中添加芦苇 - 固体类型的编码冗余,可以使用$ t^2 \,\ log \,k $冗余位在多项式时间中纠正$ t $质量错误,其中$ k $是信息字符串长度。冗余可以提高到$ \ log \,k + t $。但是,目前已知该方案的$ t $和$ n $在$ t $和$ n $中运行多项式时间的解码算法,其中$ n $是编码字符串的长度。

We consider the problem of correcting mass readout errors in information encoded in binary polymer strings. Our work builds on results for string reconstruction problems using composition multisets [Acharya et al., 2015] and the unique string reconstruction framework proposed in [Pattabiraman et al., 2019]. Binary polymer-based data storage systems [Laure et al., 2016] operate by designing two molecules of significantly different masses to represent the symbols $\{0,1\}$ and perform readouts through noisy tandem mass spectrometry. Tandem mass spectrometers fragment the strings to be read into shorter substrings and only report their masses, often with errors due to imprecise ionization. Modeling the fragmentation process output in terms of composition multisets allows for designing asymptotically optimal codes capable of unique reconstruction and the correction of a single mass error [Pattabiraman et al., 2019] through the use of derivatives of Catalan paths. Nevertheless, no solutions for multiple-mass error-corrections are currently known. Our work addresses this issue by describing the first multiple-error correction codes that use the polynomial factorization approach for the Turnpike problem [Skiena et al., 1990] and the related factorization described in [Acharya et al., 2015]. Adding Reed-Solomon type coding redundancy into the corresponding polynomials allows for correcting $t$ mass errors in polynomial time using $t^2\, \log\,k$ redundant bits, where $k$ is the information string length. The redundancy can be improved to $\log\,k + t$. However, no decoding algorithm that runs polynomial-time in both $t$ and $n$ for this scheme are currently known, where $n$ is the length of the coded string.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源