论文标题
多群DNA甲基化数据的非参数贝叶斯差分分析
Nonparametric Bayes Differential Analysis of Multigroup DNA Methylation Data
论文作者
论文摘要
癌症研究中的DNA甲基化数据集包括在许多具有复杂相关结构的称为胞嘧啶 - 磷酸 - 瓜氨酸(CPG)位点的基因组位置进行的测量。这些研究的基本目标是开发统计技术,这些技术可以识别由不同的实验或生物学疾病定义的多个患者组的疾病基因组特征。我们提出了一种非参数贝叶斯方法的贝叶斯码,用于差异分析,依靠一种新型的一级混合模型,称为粘性皮特曼 - 尤尔(Pitman-Yor)工艺或两种餐厅的两种美食系列(2R2CF)。贝叶斯码方法可以灵活地利用来自所有CPG站点或探针的信息,由于探索间距离差异很大,并且对患者群体的差异基因组信号进行了同时推断,因此适应了任何串行依赖性。使用仿真研究,我们证明了贝叶斯底机程序相对于现有统计技术的差异DNA甲基化的有效性。该方法用于分析显示串行相关和相互作用模式的胃肠道(GI)癌症数据集。结果支持和补体在上GI癌中DNA甲基化和基因关联的已知方面。
DNA methylation datasets in cancer studies are comprised of measurements on a large number of genomic locations called cytosine-phosphate-guanine (CpG) sites with complex correlation structures. A fundamental goal of these studies is the development of statistical techniques that can identify disease genomic signatures across multiple patient groups defined by different experimental or biological conditions. We propose BayesDiff, a nonparametric Bayesian approach for differential analysis relying on a novel class of first order mixture models called the Sticky Pitman-Yor process or two-restaurant two-cuisine franchise (2R2CF). The BayesDiff methodology flexibly utilizes information from all CpG sites or probes, adaptively accommodates any serial dependence due to the widely varying inter-probe distances and performs simultaneous inferences about the differential genomic signature of the patient groups. Using simulation studies, we demonstrate the effectiveness of the BayesDiff procedure relative to existing statistical techniques for differential DNA methylation. The methodology is applied to analyze a gastrointestinal (GI) cancer dataset that displays both serial correlations and interaction patterns. The results support and complement known aspects of DNA methylation and gene association in upper GI cancers.