论文标题
冷后代和剧烈的不确定性
Cold Posteriors and Aleatoric Uncertainty
论文作者
论文摘要
最近的工作观察到,通过在验证集(“冷后效应”效应)上调整后部的“温度”,可以通过调整后验的“温度”来超越贝叶斯神经网络的精确推断。为了帮助解释这一现象,我们认为贝叶斯神经网络中常用的先验可以显着高估许多分类数据集的标签中的不确定性。这个问题在MNIST或CIFAR等学术基准中尤为明显,标签的质量很高。对于高斯过程回归的特殊情况,任何正温度都对应于修改后的有效后验,并且调整此温度直接类似于经验贝叶。在分类任务上,修改先验和调整温度之间没有直接的等效性,但是降低温度可以导致模型,从而更好地反映了我们相信通过在训练集中重新标记现有示例来获得很少的信息。因此,尽管冷后代并不总是与确切的推理程序相对应,但我们认为它们通常可以更好地反映我们的真实信念。
Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect). To help interpret this phenomenon, we argue that commonly used priors in Bayesian neural networks can significantly overestimate the aleatoric uncertainty in the labels on many classification datasets. This problem is particularly pronounced in academic benchmarks like MNIST or CIFAR, for which the quality of the labels is high. For the special case of Gaussian process regression, any positive temperature corresponds to a valid posterior under a modified prior, and tuning this temperature is directly analogous to empirical Bayes. On classification tasks, there is no direct equivalence between modifying the prior and tuning the temperature, however reducing the temperature can lead to models which better reflect our belief that one gains little information by relabeling existing examples in the training set. Therefore although cold posteriors do not always correspond to an exact inference procedure, we believe they may often better reflect our true prior beliefs.