论文标题
site2Vec:一种用于蛋白质结合位点向量嵌入载体嵌入的参考框架不变算法
Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites
论文作者
论文摘要
蛋白质 - 配体相互作用是生命系统中分子相互作用的基本类型之一。配体是小分子,它们在其表面上的特定区域与称为结合位点的特定区域相互作用。诸如评估蛋白质功能相似性和对药物副作用的检测等任务需要鉴定在不同途径跨不同途径的不同结合位点。相似性评估的机器学习方法需要结合位点的特征描述符。基于手工设计的图案和原子配置的传统方法在数千个站点之间无法扩展。在这方面,现在部署了深层神经网络算法,可以捕获非常复杂的输入特征空间。但是,将深度学习应用于绑定位点的结构的一个基本挑战是输入表示和参考框架。我们在这里报告了一种新型的算法Site2VEC,该算法得出了蛋白质 - 配体结合位点的参考框架不变矢量嵌入。该方法基于代表点和化学成分之间的成对距离,就位点的组成氨基酸而言。载体嵌入作为局部性敏感哈希功能,可用于接近查询并确定相似位点。该方法一直是表现最好的表现,在10个数据集中进行的广泛基准研究中,与其他23种站点比较方法进行了超过95%的质量得分。该算法用于高吞吐量处理,并且已经评估了有关参考框架移动,协调扰动和残基突变的稳定性。我们将site2Vec作为独立可执行文件和Web服务,并在\ url {http://services.iittp.ac.in/bioinfo/home}托管。
Protein-ligand interactions are one of the fundamental types of molecular interactions in living systems. Ligands are small molecules that interact with protein molecules at specific regions on their surfaces called binding sites. Tasks such as assessment of protein functional similarity and detection of side effects of drugs need identification of similar binding sites of disparate proteins across diverse pathways. Machine learning methods for similarity assessment require feature descriptors of binding sites. Traditional methods based on hand engineered motifs and atomic configurations are not scalable across several thousands of sites. In this regard, deep neural network algorithms are now deployed which can capture very complex input feature space. However, one fundamental challenge in applying deep learning to structures of binding sites is the input representation and the reference frame. We report here a novel algorithm Site2Vec that derives reference frame invariant vector embedding of a protein-ligand binding site. The method is based on pairwise distances between representative points and chemical compositions in terms of constituent amino acids of a site. The vector embedding serves as a locality sensitive hash function for proximity queries and determining similar sites. The method has been the top performer with more than 95% quality scores in extensive benchmarking studies carried over 10 datasets and against 23 other site comparison methods. The algorithm serves for high throughput processing and has been evaluated for stability with respect to reference frame shifts, coordinate perturbations and residue mutations. We provide Site2Vec as a stand alone executable and a web service hosted at \url{http://services.iittp.ac.in/bioinfo/home}.