论文标题
蜘蛛网生成具有较高误差和实时信息检索能力的编码算法
SPIDER-WEB generates coding algorithms with superior error tolerance and real-time information retrieval capacity
论文作者
论文摘要
DNA被认为是存储数字信息的有希望的媒介。作为基于DNA的数据存储工作流的重要步骤,编码算法负责实现包括位至基础转码,误差校正等功能。在先前的研究中,通常通过引入多种算法来实现这些功能。在这里,我们报告了一个名为Spider-Web的基于图的体系结构,通过自动生成自定义算法来提供多合一的编码解决方案。 SpiderWeb能够在DNA序列中最多校正4%的编辑错误,包括替换和插入/删除(Indel),只有5.5%的冗余符号。由于校正和解码过程不需要DNA序列预处理,因此蜘蛛网提供了实时信息检索的功能,该功能比单分子测序技术的速度快305.08倍。与Megabyte级数据下的常规数据相比,我们的检索过程可以更快地提高2个数量级,并且可以扩展以适合Exabyte级数据。因此,Spider-Web具有改善大规模数据存储应用程序的实用性的潜力。
DNA has been considered a promising medium for storing digital information. As an essential step in the DNA-based data storage workflow, coding algorithms are responsible to implement functions including bit-to-base transcoding, error correction, etc. In previous studies, these functions are normally realized by introducing multiple algorithms. Here, we report a graph-based architecture, named SPIDER-WEB, providing an all-in-one coding solution by generating customized algorithms automatically. SPIDERWEB is able to correct a maximum of 4% edit errors in the DNA sequences including substitution and insertion/deletion (indel), with only 5.5% redundant symbols. Since no DNA sequence pretreatment is required for the correcting and decoding processes, SPIDER-WEB offers the function of real-time information retrieval, which is 305.08 times faster than the speed of single-molecule sequencing techniques. Our retrieval process can improve 2 orders of magnitude faster compared to the conventional one under megabyte-level data and can be scalable to fit exabyte-level data. Therefore, SPIDER-WEB holds the potential to improve the practicability in large-scale data storage applications.