论文标题

你的指标告诉你什么?根据上下文特定的可靠性定义评估分类器校准

What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability

论文作者

Kirchenbauer, John, Oaks, Jacob, Heim, Eric

论文摘要

分类器的校准由于其在促进决策做出的实用性以及现代神经网络分类器的校准效果不佳而受到了机器学习社区的最新关注。大部分重点是学习分类器的目标,以便对其最大的输出(“预测类”)进行校准。但是,对分类器输出的这种狭窄解释并不能充分捕获分类器可以帮助决策的各种实用用例。在这项工作中,我们认为必须开发出更具表现力的指标,该指标可以准确地测量将要部署分类器的特定上下文的校准误差。为此,我们使用预期校准误差(ECE)的概括来得出许多不同的指标,这些校准在可靠性的不同定义下测量了校准误差。然后,我们对这些指标的常用神经网络架构和校准技术进行了广泛的经验评估。我们发现:1)仅关注预测类的ECE的定义无法准确测量可靠性的实际有用定义的校准误差; 2)许多常见的校准技术未能从这些多样性的可靠性定义中衍生出的ECE指标均匀地提高校准性能。

Classifier calibration has received recent attention from the machine learning community due both to its practical utility in facilitating decision making, as well as the observation that modern neural network classifiers are poorly calibrated. Much of this focus has been towards the goal of learning classifiers such that their output with largest magnitude (the "predicted class") is calibrated. However, this narrow interpretation of classifier outputs does not adequately capture the variety of practical use cases in which classifiers can aid in decision making. In this work, we argue that more expressive metrics must be developed that accurately measure calibration error for the specific context in which a classifier will be deployed. To this end, we derive a number of different metrics using a generalization of Expected Calibration Error (ECE) that measure calibration error under different definitions of reliability. We then provide an extensive empirical evaluation of commonly used neural network architectures and calibration techniques with respect to these metrics. We find that: 1) definitions of ECE that focus solely on the predicted class fail to accurately measure calibration error under a selection of practically useful definitions of reliability and 2) many common calibration techniques fail to improve calibration performance uniformly across ECE metrics derived from these diverse definitions of reliability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源