论文标题
迈向现实的分布检测:一种新的评估框架,用于改善OOD检测的概括
Towards Realistic Out-of-Distribution Detection: A Novel Evaluation Framework for Improving Generalization in OOD Detection
论文作者
论文摘要
本文提出了一个新颖的评估框架(OOD)检测,旨在评估更现实的设置中机器学习模型的性能。我们观察到,当前的测试协议无法满足测试OOD检测方法的现实世界要求。他们通常会鼓励方法对正常数据的多样性有很大的偏见。为了解决这一限制,我们建议新的OOD测试数据集(CIFAR-10-R,CIFAR-100-R和Imagenet-30-R),可以使研究人员可以在现实的分配变化下基准测试OOD检测性能。此外,我们引入了概括性得分(GS),以测量在OOD检测过程中模型的概括能力。我们的实验表明,在现实世界中,改善现有基准数据集上的性能并不一定提高OOD检测模型的可用性。尽管利用深度训练的特征已被确定为OOD检测研究的有前途的途径,但我们的实验表明,在我们建议的数据集中测试的最先进的预培训模型的性能下降了显着下降。为了解决这个问题,我们提出了一个后处理阶段,用于在计算OOD分数之前在这些分布变化下调整预训练的特征,从而大大提高了我们基准上最先进的预培训模型的性能。
This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection that aims to assess the performance of machine learning models in more realistic settings. We observed that the real-world requirements for testing OOD detection methods are not satisfied by the current testing protocols. They usually encourage methods to have a strong bias towards a low level of diversity in normal data. To address this limitation, we propose new OOD test datasets (CIFAR-10-R, CIFAR-100-R, and ImageNet-30-R) that can allow researchers to benchmark OOD detection performance under realistic distribution shifts. Additionally, we introduce a Generalizability Score (GS) to measure the generalization ability of a model during OOD detection. Our experiments demonstrate that improving the performance on existing benchmark datasets does not necessarily improve the usability of OOD detection models in real-world scenarios. While leveraging deep pre-trained features has been identified as a promising avenue for OOD detection research, our experiments show that state-of-the-art pre-trained models tested on our proposed datasets suffer a significant drop in performance. To address this issue, we propose a post-processing stage for adapting pre-trained features under these distribution shifts before calculating the OOD scores, which significantly enhances the performance of state-of-the-art pre-trained models on our benchmarks.