论文标题
在真核蛋白质组的未知蛋白上
On the unknown proteins of eukaryotic proteomes
论文作者
论文摘要
为了大规模研究未知的蛋白质,已经为三个主要的真核谱系建立了参考系统,该谱系是用36个蛋白质组构建的,在分类学上尽可能多样化。然后分析了该集合中未知同源物的362个真核蛋白质组织的蛋白质,然后将其集中在单胎上,即单身蛋白,即在自己的蛋白质组中未知同源物的未知蛋白上。一致地,根据Uniprot的说法,对于给定的物种,在蛋白质水平上发现的单例中不超过12%。同样,由于它们依赖于同源序列对齐中发现的信息,因此alphafold2对其三维结构的预测通常很差。就后生种类而言,单例的数量似乎是与参考系统的进化距离的函数。有趣的是,在病毒式和真菌的情况下,没有发现这种趋势,好像在后生动物和其他真核生物王国中添加单胎的时间范围是不同的。为了确认这一现象,需要对更接近参考系统的蛋白质组进行进一步的研究。
In order to study unknown proteins on a large scale, a reference system has been set up for the three major eukaryotic lineages, built with 36 proteomes as taxonomically diverse as possible. Proteins from 362 eukaryotic proteomes with no known homologue in this set were then analyzed, focusing noteworthy on singletons, that is, on unknown proteins with no known homologue in their own proteome. Consistently, according to Uniprot, for a given species, no more than 12% of the singletons thus found are known at the protein level. Also, since they rely on the information found in the alignment of homologous sequences, predictions of AlphaFold2 for their tridimensional structure are usually poor. In the case of metazoan species, the number of singletons seems to increase as a function of the evolutionary distance from the reference system. Interestingly, no such trend is found in the cases of viridiplantae and fungi, as if the timescale on which singletons are added to proteomes were different in metazoa and in other eukaryotic kingdoms. In order to confirm this phenomenon, further studies of proteomes closer to those of the reference system are however needed.