论文标题
使用基于语言模型的深度学习方法的准确RNA 3D结构预测
Accurate RNA 3D structure prediction using a language model-based deep learning approach
论文作者
论文摘要
RNA三维(3D)结构的准确预测仍然是未解决的挑战。确定RNA 3D结构对于理解其功能并告知涉及RNA的药物开发和合成生物学设计至关重要。 RNA的结构灵活性导致实验确定的数据缺乏,使计算预测工作变得复杂。在这里,我们提出了Rhofold+,这是一种基于RNA语言模型的深度学习方法,可准确预测序列中单链RNA的3D结构。通过集成在约2370万个RNA序列和利用技术以解决数据稀缺技术的RNA语言模型,Rhofold+为RNA 3D结构预测提供了完全自动化的端到端管道。对RNA-Puzzles和CASP15天然RNA靶标的回顾性评估表明,Rhofold+的优势比包括人类专家组在内的现有方法。通过跨家族和跨类型评估以及时间审查的基准,它的功效和概括性得到了进一步验证。此外,Rhofold+预测RNA二级结构和螺旋间角,提供了经验可验证的特征,可扩大其对RNA结构和功能研究的适用性。
Accurate prediction of RNA three-dimensional (3D) structure remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to scarcity of experimentally determined data, complicates computational prediction efforts. Here, we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pre-trained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate RhoFold+'s superiority over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and inter-helical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.