论文标题

贝叶斯算法用于逆转录的算法

A Bayesian algorithm for retrosynthesis

论文作者

Guo, Zhongliang, Wu, Stephen, Ohno, Mitsuru, Yoshida, Ryo

论文摘要

以所需产品结束的合成路线的识别一直是一个固有的耗时的过程,在很大程度上取决于有关整个反应空间有限的专家知识。目前,新兴的机器学习技术正在推翻递延计划的过程。这项研究的目的是从给定所需分子到市售化合物倒退地发现合成路线。该问题与解决所有可能的可购买反应物的组合复杂性的组合空间相结合。我们在贝叶斯推理和计算的框架内解决此问题。工作流程由两个步骤组成:深度神经网络经过训练,该网络可以远程预测具有高度准确性的给定反应物的产物,然后通过贝叶斯的有条件概率定律将这种正向模型倒入向后模型。使用向后模型,使用Monte Carlo搜索算法对以给定的合成目标结尾的一组高度可能的反应序列进行了详尽的探索。贝叶斯循环合成算法可以成功地重新发现80.3%和50.0%的单步已知合成途径和TOP-10精度内的两步反应,从而在整体准确性方面都超过了最先进的算法。值得注意的是,蒙特卡洛方法是专门针对存在多种路线的专门设计的,经常揭示出数百种与同一合成目标的反应路线的排名列表。我们根据合成有机化学的专家知识研究了这种不同候选者的潜在适用性。

The identification of synthetic routes that end with a desired product has been an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited fraction of the entire reaction space. At present, emerging machine-learning technologies are overturning the process of retrosynthetic planning. The objective of this study is to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of two steps: a deep neural network is trained that forwardly predicts a product of the given reactants with a high level of accuracy, following which this forward model is inverted into the backward one via Bayes' law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. The Bayesian retrosynthesis algorithm could successfully rediscover 80.3% and 50.0% of known synthetic routes of single-step and two-step reactions within top-10 accuracy, respectively, thereby outperforming state-of-the-art algorithms in terms of the overall accuracy. Remarkably, the Monte Carlo method, which was specifically designed for the presence of diverse multiple routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We investigated the potential applicability of such diverse candidates based on expert knowledge from synthetic organic chemistry.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源