论文标题
SEQMAPPDB:独立管道,用于识别蛋白质序列的代表性结构,并在蛋白质组中实时绘制残基指数
SeqMapPDB: A Standalone Pipeline to Identify Representative Structures of Protein Sequences and Mapping Residue Indices in Real-Time at Proteome Scale
论文作者
论文摘要
动机:蛋白质的3D结构提供了丰富的信息,以理解其生化作用。确定蛋白质序列的代表性蛋白质结构对于在蛋白质组尺度上分析蛋白质至关重要。但是,确定给定蛋白序列的代表性结构并提供残基指数的准确映射存在技术困难。在结构和序列之间映射的现有数据库通常是静态的,不适合研究具有频繁基因模型修订的蛋白质组。他们通常不提供可靠且一致的代表性结构,以最大化序列覆盖范围。此外,蛋白质异构体通常无法正确解决。结果:为了克服这些困难,我们开发了一种称为SEQMAPPDB的计算管道,以提供给定序列的高质量代表性PDB结构。它提供了映射到可在可用时完全覆盖序列的结构,或最大覆盖查询序列的部分非重叠结构域的映射。残基指数是精确的映射,以及异构蛋白的分解。 SEQMAPPDB是有效的,可以实时迅速对所选参考基因组的蛋白质组映射进行映射。此外,SEQMAPPDB提供了独立管道的灵活性,用于内部序列和结构数据的大规模映射。可用性:我们的方法可在https://bitbucket.org/lianglabuic/seqmappdb提供带有GNU GPL许可证。
Motivation: 3D structures of proteins provide rich information for understanding their biochemical roles. Identifying the representative protein structures for protein sequences is essential for analysis of proteins at proteome scale. However, there are technical difficulties in identifying the representative structure of a given protein sequence and providing accurate mapping of residue indices. Existing databases of mapping between structures and sequences are usually static that are not suitable for studying proteomes with frequent gene model revisions. They often do not provide reliable and consistent representative structures that maximizes sequence coverage. Furthermore, proteins isomers are usually not properly resolved. Results: To overcome these difficulties, we have developed a computational pipeline called SeqMapPDB to provide high-quality representative PDB structures of given sequences. It provides mapping to structures that fully cover the sequences when available, or to the set of partial non-overlapping structural domains that maximally cover the query sequence. The residue indices are accurate mapped and isomeric proteins are resolved. SeqMapPDB is efficient and can rapidly carry out proteome-wide mapping to the selected version of reference genomes in real-time. Furthermore, SeqMapPDB provides the flexibility of a stand-alone pipeline for large scale mapping of in-house sequence and structure data. Availability: Our method is available at https://bitbucket.org/lianglabuic/seqmappdb with GNU GPL license.