论文标题
单个和多个删除通道的重建算法
Algorithms for reconstruction over single and multiple deletion channels
论文作者
论文摘要
DNA测序技术和DNA存储系统的最新进展重新点燃了对缺失通道的兴趣。最近的多项作品研究了单个和多个删除通道上序列重建的变体,这是由于其高度组合性质而臭名昭著的困难问题。尽管理论计算机科学中的作品提供了算法,从而保证了通过删除渠道进行多个独立观察的完美重建,但它们仅适用于大型块状制度,并且在观察次数也很大时更加限制。实际上,只有少数观察结果,在大多数情况下,甚至不可能对输入序列进行完美的重建。在这种情况下,删除通道的最大可能性(ML)和最大aposteriori(MAP)估计是出现的自然问题,并且这些问题仍然符合我们的最佳知识。在这项工作中,我们采取步骤回答上述两个问题。具体来说:1。我们表明,在单个删除通道上为ML估计求解(可以作为离散优化问题施放)等于解决其放松,这是一个连续的优化问题。 2。我们准确地计算了单个和多个删除通道的符号后分布(在先验的某些假设下)。作为我们贡献的一部分,我们还引入了可视化和分析错误事件的工具,我们认为这在有关删除渠道的其他相关问题中可能很有用。
Recent advances in DNA sequencing technology and DNA storage systems have rekindled the interest in deletion channels. Multiple recent works have looked at variants of sequence reconstruction over a single and over multiple deletion channels, a notoriously difficult problem due to its highly combinatorial nature. Although works in theoretical computer science have provided algorithms which guarantee perfect reconstruction with multiple independent observations from the deletion channel, they are only applicable in the large blocklength regime and more restrictively, when the number of observations is also large. Indeed, with only a few observations, perfect reconstruction of the input sequence may not even be possible in most cases. In such situations, maximum likelihood (ML) and maximum aposteriori (MAP) estimates for the deletion channels are natural questions that arise and these have remained open to the best of our knowledge. In this work, we take steps to answer the two aforementioned questions. Specifically: 1. We show that solving for the ML estimate over the single deletion channel (which can be cast as a discrete optimization problem) is equivalent to solving its relaxation, a continuous optimization problem; 2. We exactly compute the symbolwise posterior distributions (under some assumptions on the priors) for both the single as well as multiple deletion channels. As part of our contributions, we also introduce tools to visualize and analyze error events, which we believe could be useful in other related problems concerning deletion channels.