论文标题
使用Apache大表数据实现实现后缀阵列算法
Implementing Suffix Array Algorithm Using Apache Big Table Data Implementation
论文作者
论文摘要
在本文中,我们将使用Big Table Data Technology介绍一种有关著名后缀阵列算法的新方法。我们将展示如何通过利用高性能分布式数据存储来重构众所周知的算法,以说明使用与数据存储相关的技术来存储大型文本序列并检索它们的优势。将描述和评估使用DNA字符串的案例研究,即认为是最困难的模式匹配问题之一,以证明该实施的潜力。将描述有关性能和其他大数据相关问题的进一步讨论,以及用于精确医疗应用的大数据技术中的新可能的研究线。
In this paper we will describe a new approach on the well-known suffix-array algorithm using Big Table Data Technology. We will demonstrate how it is possible to refactor a well-known algorithm coupled by taking advantage of an high-performance distributed datastore, to illustrate the advantages of using datastore cloud related technology for storing large text sequences and retrieving them. A case study using DNA strings, considered one of the most difficult pattern matching problem, will be described and evaluated to demonstrate the potentiality of this implementation. Further discussion on performances and other big data related issues will be described as well as new possible lines of research in big data technology for precise medical applications.