论文标题

丽塔:一项有关扩展生成蛋白序列模型的研究

RITA: a Study on Scaling Up Generative Protein Sequence Models

论文作者

Hesslow, Daniel, Zanichelli, Niccoló, Notin, Pascal, Poli, Iacopo, Marks, Debora

论文摘要

在这项工作中,我们介绍了RITA:蛋白质序列的自回归生成模型套件,具有多达12亿个参数,对属于Uniref-100数据库的超过2.8亿个蛋白质序列进行了培训。这种生成模型具有极大加速蛋白质设计的希望。我们对蛋白质结构域中自回旋变压器的模型大小进行的能力大小进行了首次系统研究:我们在下一个氨基酸预测,零摄像及适应性和酶功能预测中评估了RITA模型,从而显示了量表增加的好处。我们公开发布丽塔模型,以使研究界受益。

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源