癌症药物敏感性预测的系统特征方法，深入学习

论文标题

癌症药物敏感性预测的系统特征方法，深入学习

A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning

论文作者

Clyde, Austin, Brettin, Tom, Partin, Alexander, Shaulik, Maulik, Yoo, Hyunseung, Evrard, Yvonne, Zhu, Yitan, Xia, Fangfang, Stevens, Rick

论文摘要

通过结合各种癌细胞系（CCL）药物筛查面板，数据的大小已大大增长，以开始了解深度学习的进步如何推动药物反应预测。在本文中，我们训练> 35,000个神经网络模型，扫描了常见的特征技术。我们发现RNA-Seq即使在128个功能的子集中也是高度冗余和信息丰富的。我们发现，编码为计数矩阵的单核苷酸多态性（SNP）的包含显着改善了模型性能，并且在模型性能方面没有实质性的差异，相对于共同的开源圆形描述符和Dragon7描述符之间的分子特征。除了此分析之外，我们概述了CCL筛选数据集之间的数据集成，并提供了证据表明，需要开发新的指标和不平衡数据技术以及数据标准化的进步。

By combining various cancer cell line (CCL) drug screening panels, the size of the data has grown significantly to begin understanding how advances in deep learning can advance drug response predictions. In this paper we train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features. We found the inclusion of single nucleotide polymorphisms (SNPs) coded as count matrices improved model performance significantly, and no substantial difference in model performance with respect to molecular featurization between the common open source MOrdred descriptors and Dragon7 descriptors. Alongside this analysis, we outline data integration between CCL screening datasets and present evidence that new metrics and imbalanced data techniques, as well as advances in data standardization, need to be developed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题