论文标题

无分配二进制分类:预测集,置信区间和校准

Distribution-free binary classification: prediction sets, confidence intervals and calibration

论文作者

Gupta, Chirag, Podkopaev, Aleksandr, Ramdas, Aaditya

论文摘要

我们研究了三个不确定性量化的概念 - 校准,置信区间和预测集 - 用于无分布环境中的二进制分类,即在数据上没有任何分布假设。以校准为重点,我们建立了一个定理的“三脚架”,该定理将这三个概念连接为基于得分的分类器。直接暗示的是,即使是渐近地,也只能使用一个评分函数将级别设置为最多的许多集合。参数校准方案(例如PLATT缩放的变体)不能满足此要求,而基于binning的非参数方案则不满足。为了关闭循环,我们为固定宽度和均匀的质量嵌入方式提供了无分布置信区间。由于我们的“三脚架”定理,这些额定概率的置信区间会导致无分配校准。我们还通过流数据和协变量移动来推导到设置的扩展。

We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data. With a focus towards calibration, we establish a 'tripod' of theorems that connect these three notions for score-based classifiers. A direct implication is that distribution-free calibration is only possible, even asymptotically, using a scoring function whose level sets partition the feature space into at most countably many sets. Parametric calibration schemes such as variants of Platt scaling do not satisfy this requirement, while nonparametric schemes based on binning do. To close the loop, we derive distribution-free confidence intervals for binned probabilities for both fixed-width and uniform-mass binning. As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration. We also derive extensions to settings with streaming data and covariate shift.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源