论文标题
增强多元结构化添加剂回归模型
Boosting Multivariate Structured Additive Distributional Regression Models
论文作者
论文摘要
我们开发了一种基于模型的增强方法,用于在位置,比例和形状的广义添加剂模型框架内的多元分布回归。我们的方法可以同时建模多变量响应的任意参数分布在解释变量上的条件,同时适用于潜在的高维数据。此外,提高算法包含数据驱动的变量选择,考虑了各种不同类型的效果。作为我们方法的特殊优点,它可以通过相关协变量对多个连续或离散结果之间的关联进行建模。经过详细的仿真研究研究了估计和预测性能,我们证明了在三种不同的生物医学应用中我们的方法的全部灵活性。第一个基于英国生物库的高维基因组队列数据,考虑到双变量二元反应(慢性缺血性心脏病和高胆固醇)。在这里,我们能够鉴定遗传变异体,这些变异具有胆固醇与心脏病之间关联的信息。第二次申请将澳大利亚的医疗保健需求视为咨询数量和规定的药物数量作为双变量计数的反应。第三次应用分析了尼日利亚儿童营养不良的两个维度为双变量反应,我们发现两个营养不良得分之间的相关性取决于孩子的年龄和孩子所居住的地区。
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modelling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.