论文标题
基因组和临床数据的综合理论和数据驱动建模的管道
A Pipeline for Integrated Theory and Data-Driven Modeling of Genomic and Clinical Data
论文作者
论文摘要
高吞吐量基因组测序技术(例如RNA-Seq和微阵列)有可能通过在颗粒水平上实现基因组的高通量测量来改变临床决策和生物医学研究。但是,为了真正了解疾病的原因和医疗干预措施的影响,必须将这些数据与个人的表型,环境和行为数据集成在一起。此外,需要有效的知识发现方法可以推断这些数据类型之间的关系。在这项工作中,我们提出了一条从综合基因组和临床数据中发现知识发现的管道。管道从一种新颖的变量选择方法开始,并使用概率图形模型来了解数据中特征之间的关系。我们证明了该管道如何改善乳腺癌预测模型,并可以提供对测序数据的生物学解释观点。
High throughput genome sequencing technologies such as RNA-Seq and Microarray have the potential to transform clinical decision making and biomedical research by enabling high-throughput measurements of the genome at a granular level. However, to truly understand causes of disease and the effects of medical interventions, this data must be integrated with phenotypic, environmental, and behavioral data from individuals. Further, effective knowledge discovery methods that can infer relationships between these data types are required. In this work, we propose a pipeline for knowledge discovery from integrated genomic and clinical data. The pipeline begins with a novel variable selection method, and uses a probabilistic graphical model to understand the relationships between features in the data. We demonstrate how this pipeline can improve breast cancer outcome prediction models, and can provide a biologically interpretable view of sequencing data.