论文标题
使用信号处理和机器学习的分子设计:类似时频的表示和前向设计
Molecular Design Using Signal Processing and Machine Learning: Time-Frequency-like Representation and Forward Design
论文作者
论文摘要
从量子力学(QM)理论(例如密度功能理论(DFTQM))获得的分子数据的积累使机器学习(ML)可以加速发现新分子,药物和材料的发现。将QM与ML(QM-ML)结合使用的模型在高速ML的高速下提供QM的精度非常有效。在这项研究中,我们表明,通过在QM-ML管道中整合众所周知的信号处理(SP)技术(即短时间傅立叶变换,连续的小波分析和Wigner-Ville分布),我们可以获得一个强大的机械(QM-SP-ML),可用于表示分子的表示,可视化和前向设计。更确切地说,在这项研究中,我们表明分子的时频样表示编码它们的结构,几何,能量,电子和热力学特性。通过使用正向设计环中的新表示形式作为对DFTQM计算训练的深卷积神经网络的输入来证明这一点,该网络输出了分子的性质。 Tested on the QM9 dataset (composed of 133,855 molecules and 19 properties), the new QM-SP-ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies).此外,与文献中描述的其他ML最新技术相比,新方法的性能相似或更好。总的来说,在这项研究中,我们表明新的QM-SP-ML模型代表了分子正向设计的强大技术。本研究中生成和使用的所有代码和数据均可在https://github.com/tabeau/qm-sp-ml上作为支持材料。
Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM-ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM-ML pipeline, we obtain a powerful machinery (QM-SP-ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 19 properties), the new QM-SP-ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM-SP-ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials at https://github.com/TABeau/QM-SP-ML.