论文标题

使用过滤的转移学习跨标签置信分布进行学习

Learning across label confidence distributions using Filtered Transfer Learning

论文作者

Tonekaboni, Seyed Ali Madani, Brereton, Andrew E., Safikhani, Zhaleh, Windemuth, Andreas, Haibe-Kains, Benjamin, MacKinnon, Stephen

论文摘要

神经网络模型的性能取决于不确定性水平的大型数据集的可用性。已经提出了转移学习(TL)模型,以通过在较大的,与任务相关的参考数据集中进行模型训练,然后在较小的,特定于任务的数据集中进行微调来解决小型数据集大小的问题。在这项工作中,我们采用转移学习方法来改善具有较大置信度数据集的嘈杂数据系统中的预测能力。我们提出了一种称为过滤转移学习(FTL)的深神经网络方法,该方法将数据置信度的多个层次定义为传输学习设置中的单独任务。通过以较低的标签置信度和重新培训进行迭代(过滤)数据点,在层次过程中进行了深入的神经网络。在本报告中,我们使用FTL预测药物和蛋白质的相互作用。我们证明,与在单个置信范围内训练的深神经网络模型相比,使用FTL逐步学习逐步学习,从而提高了性能。我们预计这种方法将使机器学习社区能够从生物学和医学等领域的大型数据集中受益。

Performance of neural network models relies on the availability of large datasets with minimal levels of uncertainty. Transfer Learning (TL) models have been proposed to resolve the issue of small dataset size by letting the model train on a bigger, task-related reference dataset and then fine-tune on a smaller, task-specific dataset. In this work, we apply a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets. We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks in a transfer learning setting. The deep neural network is fine-tuned in a hierarchical process by iteratively removing (filtering) data points with lower label confidence, and retraining. In this report we use FTL for predicting the interaction of drugs and proteins. We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range. We anticipate that this approach will enable the machine learning community to benefit from large datasets with uncertain labels in fields such as biology and medicine.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源