论文标题

基于单词复发的算法提取基因组词典

A word recurrence based algorithm to extract genomic dictionaries

论文作者

Bonnici, Vincenzo, Franco, Giuditta, Manca, Vincenzo

论文摘要

可以从信息角度分析基因组为很长的字符串,其中包含可变长度的功能元素,这些元素已通过进化而组装。在这项工作中,提出了基于创新信息理论的算法,以提取基因组词的重要(相对较小)的字典。也就是说,这里的概念分析与经验研究相结合,以根据某些因素的信息含量为从基因组序列中提取可变长度词典的方法。它在人类染色体上的应用强调了因子分布而言的原始染色体间相似性。

Genomes may be analyzed from an information viewpoint as very long strings, containing functional elements of variable length, which have been assembled by evolution. In this work an innovative information theory based algorithm is proposed, to extract significant (relatively small) dictionaries of genomic words. Namely, conceptual analyses are here combined with empirical studies, to open up a methodology for the extraction of variable length dictionaries from genomic sequences, based on the information content of some factors. Its application to human chromosomes highlights an original inter-chromosomal similarity in terms of factor distributions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源