论文标题

深度学习前进和反向引物设计,以检测SARS-COV-2新兴变体

Deep learning forward and reverse primer design to detect SARS-CoV-2 emerging variants

论文作者

Wang, Hanyu, Tsinda, Emmanuel K., Dunn, Anthony J., Chikweto, Francis, Ahmed, Nusreen, Pelosi, Emanuela, Zemkoho, Alain B.

论文摘要

在COVID-19病例数量中不同时期观察到的激增与多个SARS-COV-2(严重急性呼吸道病毒)变体的出现有关。支持实验室检测的方法的设计对于监测这些变体至关重要。因此,在本文中,我们开发了一种半自动化的方法来设计前进和反向引物集以检测SARS-COV-2变体。为了进行,我们训练深卷积神经网络(CNN),以对标记的SARS-COV-2变体进行分类,并确定正向和反向聚合酶链反应(PCR)引物设计所需的部分基因组特征。我们提出的方法为现有方法提供了现有的方法,同时促进了PCR神经网络辅助引物设计的新兴概念。使用GISAID的SARS-COV-2全长基因组的数据库对我们的CNN模型进行了训练,并在NCBI的单独数据集上进行了测试,对变体的分类为98 \%的准确性。该结果基于三种不同特征提取方法的开发,并且每个独立的5000个变体序列中,每个SARS-COV-2变体检测的选定引物序列(除Omicron除外)(除OMICRON除外),而在其他独立数据集中,具有每个变体的5000个序列。总的来说,我们获得了22个向前和反向引物对,其柔性长度大小(18-25个碱基对),预期的扩增子长度在42至3322个核苷酸之间。除了特征外观外,塞里科底漆检查还证实,鉴定的底漆对适合通过PCR测试进行精确的SARS-COV-2变体检测。

Surges that have been observed at different periods in the number of COVID-19 cases are associated with the emergence of multiple SARS-CoV-2 (Severe Acute Respiratory Virus) variants. The design of methods to support laboratory detection are crucial in the monitoring of these variants. Hence, in this paper, we develop a semi-automated method to design both forward and reverse primer sets to detect SARS-CoV-2 variants. To proceed, we train deep Convolution Neural Networks (CNNs) to classify labelled SARS-CoV-2 variants and identify partial genomic features needed for the forward and reverse Polymerase Chain Reaction (PCR) primer design. Our proposed approach supplements existing ones while promoting the emerging concept of neural network assisted primer design for PCR. Our CNN model was trained using a database of SARS-CoV-2 full-length genomes from GISAID and tested on a separate dataset from NCBI, with 98\% accuracy for the classification of variants. This result is based on the development of three different methods of feature extraction, and the selected primer sequences for each SARS-CoV-2 variant detection (except Omicron) were present in more than 95 \% of sequences in an independent set of 5000 same variant sequences, and below 5 \% in other independent datasets with 5000 sequences of each variant. In total, we obtain 22 forward and reverse primer pairs with flexible length sizes (18-25 base pairs) with an expected amplicon length ranging between 42 and 3322 nucleotides. Besides the feature appearance, in-silico primer checks confirmed that the identified primer pairs are suitable for accurate SARS-CoV-2 variant detection by means of PCR tests.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源