论文标题

未观察到的置信区间

Confidence Intervals for Unobserved Events

论文作者

Painsky, Amichai

论文摘要

考虑来自可数字字母的未知分布中的有限样本。未观察到的事件是样本中未出现的字母符号。估计未观察到的事件的概率是统计和相关领域中的一个基本问题,在点估计的背景下进行了广泛的研究。在这项工作中,我们介绍了一种新的间隔估计方案,以实现未观察到的事件。当我们为所需的一组参数构建置信区间(CI)时,我们提出的框架应用了选择性推断。有趣的是,我们表明获得的顺式不含尺寸,因为它们不会随字母大小而生长。此外,我们表明这些顺式(几乎)很紧,从某种意义上说,如果不违反规定的覆盖率,它们无法进一步改进。我们证明了我们在合成和现实世界实验中提出的方案的性能,显示了对替代方案的显着改善。最后,我们将建议的方案应用于大型字母建模。我们为大型字母分布引入了一种新型的同时CI方案,该方案在维持规定的覆盖率的同时优于当前已知的方法。

Consider a finite sample from an unknown distribution over a countable alphabet. Unobserved events are alphabet symbols which do not appear in the sample. Estimating the probabilities of unobserved events is a basic problem in statistics and related fields, which was extensively studied in the context of point estimation. In this work we introduce a novel interval estimation scheme for unobserved events. Our proposed framework applies selective inference, as we construct confidence intervals (CIs) for the desired set of parameters. Interestingly, we show that obtained CIs are dimension-free, as they do not grow with the alphabet size. Further, we show that these CIs are (almost) tight, in the sense that they cannot be further improved without violating the prescribed coverage rate. We demonstrate the performance of our proposed scheme in synthetic and real-world experiments, showing a significant improvement over the alternatives. Finally, we apply our proposed scheme to large alphabet modeling. We introduce a novel simultaneous CI scheme for large alphabet distributions which outperforms currently known methods while maintaining the prescribed coverage rate.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源