论文标题
基于氨基酸序列的大分子分类
Macromolecule Classification Based on the Amino-acid Sequence
论文作者
论文摘要
深度学习在涉及数据的每个领域都起着至关重要的作用。它已经成为一个强大而有效的框架,可以应用于广泛的复杂学习问题,过去很难使用传统的机器学习技术来解决。在这项研究中,我们专注于用深度学习技术的蛋白质序列分类。氨基酸序列的研究在生命科学中至关重要。我们使用不同单词的嵌入技术从自然语言处理中代表氨基酸序列作为向量。我们的主要目标是将序列分类为DNA,RNA,蛋白质和杂种的四组类别。经过几次测试,我们达到了近99%的火车和测试准确性。我们已经在CNN,LSTM,双向LSTM和GRU上进行了实验。
Deep learning is playing a vital role in every field which involves data. It has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using traditional machine learning techniques in the past. In this study we focused on classification of protein sequences with deep learning techniques. The study of amino acid sequence is vital in life sciences. We used different word embedding techniques from Natural Language processing to represent the amino acid sequence as vectors. Our main goal was to classify sequences to four group of classes, that are DNA, RNA, Protein and hybrid. After several tests we have achieved almost 99% of train and test accuracy. We have experimented on CNN, LSTM, Bidirectional LSTM, and GRU.