论文标题
超越使用变压器的化学1D知识
Beyond Chemical 1D knowledge using Transformers
论文作者
论文摘要
在本文中,我们评估了最近的变压器CNN模型的效率,以根据增强的立体化学微笑来预测目标性质。我们选择了一个众所周知的悬崖活动数据集以及偶极矩数据集,并比较了微笑中R/S立体化学的三个表示的效果。所考虑的表示是没有立体化学(Nochismi),经典相对立体化学编码(Relchismi)和具有绝对立体化学编码(Abschismi)的替代版本的微笑。在微笑表示中包含r/s允许根据微笑表示简化各个信息的分配,但并不总是在回归或分类任务上显示优势。有趣的是,当立体化学信息在微笑中不存在时,我们没有看到变压器-CNN模型的性能退化。此外,与基于3D结构的基于描述符的模型相比,这些模型显示出更高或类似的性能。这些观察结果是3D化学任务NLP建模的重要一步。一个开放的挑战是,变形金刚CNN是否可以从微笑输入中有效地嵌入3D知识,以及更好的表示能够进一步提高该方法的准确性。
In the present paper we evaluated efficiency of the recent Transformer-CNN models to predict target properties based on the augmented stereochemical SMILES. We selected a well-known Cliff activity dataset as well as a Dipole moment dataset and compared the effect of three representations for R/S stereochemistry in SMILES. The considered representations were SMILES without stereochemistry (noChiSMI), classical relative stereochemistry encoding (RelChiSMI) and an alternative version with absolute stereochemistry encoding (AbsChiSMI). The inclusion of R/S in SMILES representation allowed simplify the assignment of the respective information based on SMILES representation, but did not always show advantages on regression or classification tasks. Interestingly, we did not see degradation of the performance of Transformer-CNN models when the stereochemical information was not present in SMILES. Moreover, these models showed higher or similar performance compared to descriptor-based models based on 3D structures. These observations are an important step in NLP modeling of 3D chemical tasks. An open challenge remains whether Transformer-CNN can efficiently embed 3D knowledge from SMILES input and whether a better representation could further increase the accuracy of this approach.