论文标题
半监督分层药物嵌入双曲线空间
Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space
论文作者
论文摘要
学习准确的药物表示对于药物重新定位和药物副作用的预测等任务至关重要。药物层次结构是一种有价值的来源,它在类似树状的结构中编码人类对药物关系的知识,在树状结构中,在同一器官上作用的药物,治疗相同的疾病或与同一生物学靶标结合在一起。但是,它在学习药物表示方面的效用尚未探索,目前描述的药物表示不能将新颖的分子放在药物层次结构中。在这里,我们开发了一种半监视的药物嵌入,其中包含了两个信息来源:(1)从药物和类似药物样分子(无pressuped)的分子结构中推断出的基本化学语法,以及(2)在专业制作的批准药物中编码的层次结构(2)层次结构。我们使用变异自动编码器(VAE)框架来编码分子的化学结构,并使用基于知识的药物 - 药物相似性来诱导双曲线空间中的药物聚类。双曲线空间适合编码层次结构概念。定量和定性结果都支持学习的药物嵌入可以准确再现化学结构并诱导药物之间的分层关系。此外,我们的方法可以通过从嵌入空间中检索类似的药物来推断新分子的药理特性。我们证明,学到的药物嵌入可用于寻找现有药物的新用途并发现副作用。我们表明,它在这两个任务中都大大胜过基准。
Learning accurate drug representation is essential for tasks such as computational drug repositioning and prediction of drug side-effects. A drug hierarchy is a valuable source that encodes human knowledge of drug relations in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy. Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from molecular structures of drugs and drug-like molecules (unsupervised), and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the knowledge-based drug-drug similarity to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical concepts. Both quantitative and qualitative results support that the learned drug embedding can accurately reproduce the chemical structure and induce the hierarchical relations among drugs. Furthermore, our approach can infer the pharmacological properties of novel molecules by retrieving similar drugs from the embedding space. We demonstrate that the learned drug embedding can be used to find new uses for existing drugs and to discover side-effects. We show that it significantly outperforms baselines in both tasks.