论文标题
蛋白质/配体结合模型的优化
Split Optimization for Protein/Ligand Binding Models
论文作者
论文摘要
在本文中,我们研究了用于使用机器学习进行药物结合预测的数据集中的潜在偏差。我们研究了一个最近发表的指标,称为“不对称验证嵌入(AVE)偏差”,用于量化这种偏见并检测过度拟合。我们将其与稍微修订的版本进行了比较,并引入了新的加权度量。我们发现,新的指标允许量化过度拟合,而不会过分限制培训数据并产生具有更大预测价值的模型。
In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.