论文标题
较小的$ p $ - 基因组学研究使用蒸馏历史信息
Smaller $p$-values in genomics studies using distilled historical information
论文作者
论文摘要
医学研究机构通过基因分析数百个癌细胞系产生了大量的生物学数据。同时,在自定义实验条件下,学术生物学实验室已经对少量癌细胞系进行了遗传筛选。为了在这两种科学发现方法之间共享信息,本文提出了一种假设测试的“经常派贝叶斯的经常主义”(FAB)程序,该程序允许来自大规模基因组学数据集中的历史信息在专业研究中提高假设测试的能力。信息的交换是通过新的多模式基因组数据的新型概率模型进行的,该模型将与癌细胞系和基因有关的历史信息提炼出各种各样的实验环境。如果历史信息与给定研究的相关性很高,那么所得的FAB测试可能比相应的经典测试更强大。如果相关性较低,则FAB测试产生的发现与经典测试一样多。仿真和实践研究表明,FAB测试程序可以增加基因组学研究中发现的影响数量,同时仍然严格控制I型错误和错误发现率。
Medical research institutions have generated massive amounts of biological data by genetically profiling hundreds of cancer cell lines. In parallel, academic biology labs have conducted genetic screens on small numbers of cancer cell lines under custom experimental conditions. In order to share information between these two approaches to scientific discovery, this article proposes a "frequentist assisted by Bayes" (FAB) procedure for hypothesis testing that allows historical information from massive genomics datasets to increase the power of hypothesis tests in specialized studies. The exchange of information takes place through a novel probability model for multimodal genomics data, which distills historical information pertaining to cancer cell lines and genes across a wide variety of experimental contexts. If the relevance of the historical information for a given study is high, then the resulting FAB tests can be more powerful than the corresponding classical tests. If the relevance is low, then the FAB tests yield as many discoveries as the classical tests. Simulations and practical investigations demonstrate that the FAB testing procedure can increase the number of effects discovered in genomics studies while still maintaining strict control of type I error and false discovery rates.