在多级回归和延伸后（MRP）工作流程中，使用剩余的交叉验证（LOO）：一个警示的故事

论文标题

在多级回归和延伸后（MRP）工作流程中，使用剩余的交叉验证（LOO）：一个警示的故事

Using leave-one-out cross-validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale

论文作者

Kuh, Swen, Kennedy, Lauren, Chen, Qixuan, Gelman, Andrew

论文摘要

近几十年来，多层次回归和延伸后（MRP）在人口推论中迅速普及。但是，估计值的有效性取决于模型的细节，目前几乎没有验证研究。我们探讨了如何使用剩余的交叉验证（LOO）比较MRP的贝叶斯模型。我们研究了对LOO的两个近似计算，Pareto的重要性采样（PSI-Loo）和一个调查加权替代方案（WTD-PSIS-LOO）。使用两个模拟设计，我们检查了这两个标准在预测种群和小面积水平估计中的正确排序的准确性。首先专注于可变选择，我们发现PSI-Loo和WTD-PSIS-LOO都没有正确地恢复MRP群体估计的模型订单（尽管这两个标准都正确地识别了最佳和最差的模型）。在考虑小区域估计时，最佳模型在不同的小区域方面有所不同，强调了MRP验证的复杂性。在考虑不同的先验时，模型的顺序在较小的面积水平上似乎稍好一些。这些发现表明，尽管并不可怕，但基于PSI-loo的排名技术可能不适合评估MRP作为一种方法。我们建议这是由于MRP的聚合阶段，即单个级别的预测错误平均出现。这些结果表明，在实践中，基于PSI-LOO的模型验证工具需要谨慎使用，并且在验证MRP作为一种方法时可能不会传达全部故事。

In recent decades, multilevel regression and poststratification (MRP) has surged in popularity for population inference. However, the validity of the estimates can depend on details of the model, and there is currently little research on validation. We explore how leave-one-out cross-validation (LOO) can be used to compare Bayesian models for MRP. We investigate two approximate calculations of LOO, the Pareto smoothed importance sampling (PSIS-LOO) and a survey-weighted alternative (WTD-PSIS-LOO). Using two simulation designs, we examine how accurately these two criteria recover the correct ordering of model goodness at predicting population and small area level estimands. Focusing first on variable selection, we find that neither PSIS-LOO nor WTD-PSIS-LOO correctly recovers the models' order for an MRP population estimand (although both criteria correctly identify the best and worst model). When considering small-area estimation, the best model differs for different small areas, highlighting the complexity of MRP validation. When considering different priors, the models' order seems slightly better at smaller area levels. These findings suggest that while not terrible, PSIS-LOO-based ranking techniques may not be suitable to evaluate MRP as a method. We suggest this is due to the aggregation stage of MRP, where individual-level prediction errors average out. These results show that in practice, PSIS-LOO-based model validation tools need to be used with caution and might not convey the full story when validating MRP as a method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题