论文标题
通过多项式基质分解,用于微生物宏基因组测序数据的贝叶斯双簇数据
Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization
论文作者
论文摘要
高通量测序技术为定量探索人类肠道微生物组及其与疾病的关系提供了前所未有的机会。微生物组数据是组成,稀疏,嘈杂和异质性,这对统计建模构成了严重的挑战。我们提出了一个可识别的贝叶斯多项式矩阵分解模型,以推断微生物和宿主的重叠簇。所提出的方法表示观察到的过度分散的零充气矩阵作为Dirichlet-Multinoilial混合物,在该混合物上,在层次上构建潜在的群集结构。在贝叶斯框架下,自动确定了集群的数量,并且自然合并了从微生物的分类学等级树中的可用信息,从而大大提高了我们的发现的可解释性。我们通过与模拟中的替代方法进行比较来证明所提出的方法的实用性。 An application to a human gut microbiome dataset involving patients with inflammatory bowel disease reveals interesting clusters, which contain bacteria families Bacteroidaceae, Bifidobacteriaceae, Enterobacteriaceae, Fusobacteriaceae, Lachnospiraceae, Ruminococcaceae, Pasteurellaceae, and Porphyromonadaceae that are known to be related to the inflammatory根据生物学文献,肠病及其亚型。我们的发现可以帮助产生潜在的假设,以便将来研究人类肠道微生物组的异质性。
High-throughput sequencing technology provides unprecedented opportunities to quantitatively explore human gut microbiome and its relation to diseases. Microbiome data are compositional, sparse, noisy, and heterogeneous, which pose serious challenges for statistical modeling. We propose an identifiable Bayesian multinomial matrix factorization model to infer overlapping clusters on both microbes and hosts. The proposed method represents the observed over-dispersed zero-inflated count matrix as Dirichlet-multinomial mixtures on which latent cluster structures are built hierarchically. Under the Bayesian framework, the number of clusters is automatically determined and available information from a taxonomic rank tree of microbes is naturally incorporated, which greatly improves the interpretability of our findings. We demonstrate the utility of the proposed approach by comparing to alternative methods in simulations. An application to a human gut microbiome dataset involving patients with inflammatory bowel disease reveals interesting clusters, which contain bacteria families Bacteroidaceae, Bifidobacteriaceae, Enterobacteriaceae, Fusobacteriaceae, Lachnospiraceae, Ruminococcaceae, Pasteurellaceae, and Porphyromonadaceae that are known to be related to the inflammatory bowel disease and its subtypes according to biological literature. Our findings can help generate potential hypotheses for future investigation of the heterogeneity of the human gut microbiome.