论文标题
通过基于相似性的机器学习改进决策:化学应用
Improved decision making with similarity based machine learning: Applications in chemistry
论文作者
论文摘要
尽管自动分子和材料发现取得了根本的进展,但整个化学复合空间的数据稀缺仍然严重阻碍了现代现成的机器学习模型的使用,因为它们严重依赖范式,“数据越大,则越好”。我们展示了一种基于相似性的机器学习(SML),我们展示了一种选择数据并直接训练模型以进行特定查询,从而在化学中的数据稀缺情况下实现了决策。通过仅依靠查询和培训数据接近来选择培训点,只需要一小部分数据才能收敛到竞争性能。在引入谐波振荡器和Rosenbrock函数的SML之后,我们描述了在化学中稀缺数据方案的应用,其中包括基于量子力学的分子设计和有机合成计划。最后,我们得出了内在维度和特征空间体积之间的关系,管理整体模型的准确性。
Despite the fundamental progress in autonomous molecular and materials discovery, data scarcity throughout chemical compound space still severely hampers the use of modern ready-made machine learning models as they rely heavily on the paradigm, 'the bigger the data the better'. Presenting similarity based machine learning (SML), we show an approach to select data and train a model on-the-fly for specific queries, enabling decision making in data scarce scenarios in chemistry. By solely relying on query and training data proximity to choose training points, only a fraction of data is necessary to converge to competitive performance. After introducing SML for the harmonic oscillator and the Rosenbrock function, we describe applications to scarce data scenarios in chemistry which include quantum mechanics based molecular design and organic synthesis planning. Finally, we derive a relationship between the intrinsic dimensionality and volume of feature space, governing the overall model accuracy.