通过机器学习高吞吐量筛查

论文标题

通过机器学习高吞吐量筛查

High throughput screening with machine learning

论文作者

Gurbych, Oleksandr, Druchok, Maksym, Yarish, Dzvenymyra, Garkot, Sofiya

论文摘要

这项研究评估了几种流行的机器学习方法在分子结合亲和力的预测中的效率：catboost，图形注意力神经网络和来自变压器的双向编码器表示。对模型进行了训练，可以根据抑制常数$ k_i $的蛋白质和小有机分子的限制亲和力。前两种方法使用彻底选择的物理化学特征，而第三种方法基于文本分子表示 - 它是将基于变压器的预测指标应用于结合亲和力的首次尝试之一。我们还讨论了变压器方法中注意层的可视化，以突出导致相互作用的分子位点。所有方法都没有原子空间坐标，因此避免了已知结构的偏见，并能够概括具有未知构象的化合物。所有建议的方法所达到的准确性证明了它们在高吞吐量筛选中的潜力。

This study assesses the efficiency of several popular machine learning approaches in the prediction of molecular binding affinity: CatBoost, Graph Attention Neural Network, and Bidirectional Encoder Representations from Transformers. The models were trained to predict binding affinities in terms of inhibition constants $K_i$ for pairs of proteins and small organic molecules. First two approaches use thoroughly selected physico-chemical features, while the third one is based on textual molecular representations - it is one of the first attempts to apply Transformer-based predictors for the binding affinity. We also discuss the visualization of attention layers within the Transformer approach in order to highlight the molecular sites responsible for interactions. All approaches are free from atomic spatial coordinates thus avoiding bias from known structures and being able to generalize for compounds with unknown conformations. The achieved accuracy for all suggested approaches prove their potential in high throughput screening.

下载PDF全文

下载文献需遵守相关版权规定

论文标题