论文标题

蛋白质语言模型拯救突变突出了临床相关基因的变异效应和结构

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

论文作者

Soylemez, Onuralp, Cordero, Pablo

论文摘要

尽管是自我监督的,但蛋白质语言模型在基本生物学任务中表现出色,例如预测遗传变异对蛋白质结构和功能的影响。这些模型对各种任务的有效性表明,它们学习有意义的健身景观表示,这对于下游临床应用有用。在这里,我们通过详尽地在每个变体的遗传背景上详尽地搜索假定的补偿性突变来质疑这些语言模型在表征已知的致病性突变中的使用。对这些补偿性突变的预测作用的系统分析揭示了蛋白质的未引起的结构特征,这些结构特征被其他结构预测因子(如Alphafold)所遗漏。虽然深度突变扫描实验提供了对突变景观的公正估计,但我们鼓励社区生成和策划救援突变实验,以告知更复杂的共同掩盖策略的设计,并更有效地利用大型语言模型来下游临床预测任务。

Despite being self-supervised, protein language models have shown remarkable performance in fundamental biological tasks such as predicting impact of genetic variation on protein structure and function. The effectiveness of these models on diverse set of tasks suggests that they learn meaningful representations of fitness landscape that can be useful for downstream clinical applications. Here, we interrogate the use of these language models in characterizing known pathogenic mutations in curated, medically actionable genes through an exhaustive search of putative compensatory mutations on each variant's genetic background. Systematic analysis of the predicted effects of these compensatory mutations reveal unappreciated structural features of proteins that are missed by other structure predictors like AlphaFold. While deep mutational scan experiments provide an unbiased estimate of the mutational landscape, we encourage the community to generate and curate rescue mutation experiments to inform the design of more sophisticated co-masking strategies and leverage large language models more effectively for downstream clinical prediction tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源