论文标题
通过不完美的测试检查个人和抽样种群
Checking individuals and sampling populations with imperfect tests
论文作者
论文摘要
在过去的几个月中,由于COVID-19的紧急情况,与特定类别的个人(“感染或未感染”)有关的问题(通过测试被标记为“正”或“阴性”)从未如此受欢迎。同样,人们对估计预期具有给定特征的人群的比例(“患有或患有病毒”)也引起了人们的浓厚兴趣。从媒体上的许多相关讨论中获取提示,除了我们参加的讨论之外,我们还从概率的角度(“贝叶斯”)分析了这些问题,考虑了几种在评估感兴趣的概率中起作用的作用。最终的论文以教义意图编写,是相当笼统的,并且与大流行者无关:引入了贝叶斯推论的基本思想,并使用“系统态度的计量学概念”来处理测试表现的不确定性,并依靠“系统概念”,并依靠遵守概率理论规则的范围; “统计”和“系统”对不确定性在推断的感染比例上的贡献可以优化样本量;经常被忽略的“先验”的作用是强调的,但是建议使用“平坦先验”,因为在以后的一步中可以通过“信息性先验”来“重塑”;给出了计算的详细信息,还得出了有用的近似公式,但是在直接的蒙特卡洛模拟的帮助下完成了艰难的工作,而马尔可夫链蒙特卡洛(Monte Carlo)则在R和JAGS中实现(附录中提供的相关代码)。
In the last months, due to the emergency of Covid-19, questions related to the fact of belonging or not to a particular class of individuals (`infected or not infected'), after being tagged as `positive' or `negative' by a test, have never been so popular. Similarly, there has been strong interest in estimating the proportion of a population expected to hold a given characteristics (`having or having had the virus'). Taking the cue from the many related discussions on the media, in addition to those to which we took part, we analyze these questions from a probabilistic perspective (`Bayesian'), considering several effects that play a role in evaluating the probabilities of interest. The resulting paper, written with didactic intent, is rather general and not strictly related to pandemics: the basic ideas of Bayesian inference are introduced and the uncertainties on the performances of the tests are treated using the metrological concepts of `systematics', and are propagated into the quantities of interest following the rules of probability theory; the separation of `statistical' and `systematic' contributions to the uncertainty on the inferred proportion of infectees allows to optimize the sample size; the role of `priors', often overlooked, is stressed, however recommending the use of `flat priors', since the resulting posterior distribution can be `reshaped' by an `informative prior' in a later step; details on the calculations are given, also deriving useful approximated formulae, the tough work being however done with the help of direct Monte Carlo simulations and Markov Chain Monte Carlo, implemented in R and JAGS (relevant code provided in appendix).