多源多路数据的贝叶斯预测建模

论文标题

多源多路数据的贝叶斯预测建模

Bayesian predictive modeling of multi-source multi-way data

论文作者

Kim, Jonathan, Sandri, Brian J., Rao, Raghavendra B., Lock, Eric F.

论文摘要

我们开发了一种贝叶斯方法，以预测从具有多通道（即多维张量）结构的多个来源收集的数据的连续或二元结果。作为一个激励示例，我们将来自多个'Omics源的分子数据（在多个发育时间点测量）中，作为恒河猴模型中早期铁缺乏症（ID）的预测指标。我们在系数上使用具有低级别结构的线性模型来捕获多路的依赖性，并在每个源分别对系数的方差进行建模以推断其相对贡献。共轭先验促进了有效的Gibbs对后推断的采样算法，假设具有正常误差的连续结果或具有概率链接的二元结果。模拟表明，我们的模型在错误分类速率以及估计系数与真实系数的相关性方面的性能如预期的，在考虑到不同来源的不同信号大小时，通过合并多路结构和适度的增长，可以通过稳定的性能提高。此外，它为我们的激励应用提供了ID猴子的强大分类。以R代码形式的软件可在https://github.com/biostatskim/bayesmsmw上获得。

We develop a Bayesian approach to predict a continuous or binary outcome from data that are collected from multiple sources with a multi-way (i.e.. multidimensional tensor) structure. As a motivating example we consider molecular data from multiple 'omics sources, each measured over multiple developmental time points, as predictors of early-life iron deficiency (ID) in a rhesus monkey model. We use a linear model with a low-rank structure on the coefficients to capture multi-way dependence and model the variance of the coefficients separately across each source to infer their relative contributions. Conjugate priors facilitate an efficient Gibbs sampling algorithm for posterior inference, assuming a continuous outcome with normal errors or a binary outcome with a probit link. Simulations demonstrate that our model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients, with large gains in performance by incorporating multi-way structure and modest gains when accounting for differing signal sizes across the different sources. Moreover, it provides robust classification of ID monkeys for our motivating application. Software in the form of R code is available at https://github.com/BiostatsKim/BayesMSMW .

下载PDF全文

下载文献需遵守相关版权规定

论文标题