论文标题

用于多元,混合和缺失数据的非参数copula模型

Nonparametric Copula Models for Multivariate, Mixed, and Missing Data

论文作者

Feldman, Joseph, Kowal, Daniel R.

论文摘要

现代数据集通常具有大量丢失和混合数据类型的许多变量,这在估计和推理方面构成了重大挑战。完全案例分析仅使用具有完全观察到的变量的观测值进行进行,而基于模型的缺失值的插补受模型捕获混合数据类型(可能是许多)变量之间复杂依赖性的能力的限制。为了应对这些挑战,我们开发了一种新型的贝叶斯混合物副群,用于用于多元计数,连续,序数和无序的分类变量的关节和非参数建模,并将此模型部署用于丢失数据的推理,预测和插入。最独特的是,我们引入了一种新的和计算有效的策略,以进行边际分布估计,以消除指定任何边缘模型的需求,但在每个边缘分布和丢失范围内的copula参数都提供后验一致性。广泛的仿真研究表明,相对于竞争方法,尤其是混合数据类型,复杂的缺失机制和非线性依赖性相对于竞争方法,具有出色的建模和归合能力。我们以数据分析结论,该数据分析强调了缺失数据的不当处理如何扭曲统计分析,以及建议的方法如何提供解决方案。

Modern datasets commonly feature both substantial missingness and many variables of mixed data types, which present significant challenges for estimation and inference. Complete case analysis, which proceeds using only the observations with fully-observed variables, is often severely biased, while model-based imputation of missing values is limited by the ability of the model to capture complex dependencies among (possibly many) variables of mixed data types. To address these challenges, we develop a novel Bayesian mixture copula for joint and nonparametric modeling of multivariate count, continuous, ordinal, and unordered categorical variables, and deploy this model for inference, prediction, and imputation of missing data. Most uniquely, we introduce a new and computationally efficient strategy for marginal distribution estimation that eliminates the need to specify any marginal models yet delivers posterior consistency for each marginal distribution and the copula parameters under missingness-at-random. Extensive simulation studies demonstrate exceptional modeling and imputation capabilities relative to competing methods, especially with mixed data types, complex missingness mechanisms, and nonlinear dependencies. We conclude with a data analysis that highlights how improper treatment of missing data can distort a statistical analysis, and how the proposed approach offers a resolution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源