基于强大的模型对非概率样本的推断

论文标题

基于强大的模型对非概率样本的推断

Robust Model-based Inference for Non-Probability Samples

论文作者

Rafei, Ali, Elliott, Michael R., Flannagan, Carol A. C.

论文摘要

随着非结构化数据无处不在的可用性，将注意力不断增加，这是如何调整这种非概率样本中选择偏差的方法。先前文献提出的大多数稳健估计量都是完全或部分基于设计的，如果存在外围（伪），则可能导致效率低下的估计。此外，当可用的参考调查在样本设计中很复杂时，正确反映调整后的估计器的不确定性仍然是一个挑战。本文提出了一种完全基于模型的方法，用于使用非概率样本进行推理，其目标是预测整个人口单位的结果变量。我们采用了带有鲁宾的梳子规则的贝叶斯引导方法，以得出调整后的点和间隔估计。使用高斯工艺回归，我们的方法允许基于估计的选择模型误指定时，基于估计的选择倾向，在非概率样本单位和人口单位之间进行内核匹配。通过两项蒙特卡洛模拟研究评估了我们方法的重复采样特性。最后，我们在现实世界中的非概率样本上对其进行了检查，目的是估算美国不同人体地区崩溃的损伤率。

With the ubiquitous availability of unstructured data, growing attention is paid as how to adjust for selection bias in such non-probability samples. The majority of the robust estimators proposed by prior literature are either fully or partially design-based, which may lead to inefficient estimates if outlying (pseudo-)weights are present. In addition, correctly reflecting the uncertainty of the adjusted estimator remains a challenge when the available reference survey is complex in the sample design. This article proposes a fully model-based method for inference using non-probability samples where the goal is to predict the outcome variable for the entire population units. We employ a Bayesian bootstrap method with Rubin's combing rules to derive the adjusted point and interval estimates. Using Gaussian process regression, our method allows for kernel matching between the non-probability sample units and population units based on the estimated selection propensities when the outcome model is misspecified. The repeated sampling properties of our method are evaluated through two Monte Carlo simulation studies. Finally, we examine it on a real-world non-probability sample with the aim to estimate crash-attributed injury rates in different body regions in the United States.

下载PDF全文

下载文献需遵守相关版权规定

论文标题