论文标题
AFAFOCAL:校准感知的自适应局灶性损失
AdaFocal: Calibration-aware Adaptive Focal Loss
论文作者
论文摘要
最近的许多工作致力于确保神经网络的置信度得分与正确的可能性相匹配的问题,即校准问题。值得注意的是,发现具有局灶性损失的训练会导致比跨渗透性更好地校准,同时达到相似的精度\ cite {mukhoti2020}。这种成功源于局部损失使模型预测的熵(由参数$γ$控制)的熵,从而在模型的过度自信中重新制定。如果每个训练样本独立选择$γ$(依赖样本依赖性局灶性损耗(FLSD-53)\ cite {mukhoti2020}),则预计将有进一步的改进。但是,FLSD-53是基于启发式方法,并且不能很好地概括。在本文中,我们提出了一种称为Afafocal的校准感应性局灶性损失,它利用了焦点(和逆 - 焦点)损失的校准属性,并根据$γ_{T-1} $的不同样本的$γ_T$自适应修饰$γ_T$,从上一个步骤和模型下/过度con的知识设置的知识。我们在各种图像识别和一项NLP任务上评估了Adafocal,涵盖了各种网络体系结构,以确认校准的改进,同时达到相似的准确性水平。此外,我们表明,接受过空白训练的模型可在分布外检测方面显着增强。
Much recent work has been devoted to the problem of ensuring that a neural network's confidence scores match the true probability of being correct, i.e. the calibration problem. Of note, it was found that training with focal loss leads to better calibration than cross-entropy while achieving similar level of accuracy \cite{mukhoti2020}. This success stems from focal loss regularizing the entropy of the model's prediction (controlled by the parameter $γ$), thereby reining in the model's overconfidence. Further improvement is expected if $γ$ is selected independently for each training sample (Sample-Dependent Focal Loss (FLSD-53) \cite{mukhoti2020}). However, FLSD-53 is based on heuristics and does not generalize well. In this paper, we propose a calibration-aware adaptive focal loss called AdaFocal that utilizes the calibration properties of focal (and inverse-focal) loss and adaptively modifies $γ_t$ for different groups of samples based on $γ_{t-1}$ from the previous step and the knowledge of model's under/over-confidence on the validation set. We evaluate AdaFocal on various image recognition and one NLP task, covering a wide variety of network architectures, to confirm the improvement in calibration while achieving similar levels of accuracy. Additionally, we show that models trained with AdaFocal achieve a significant boost in out-of-distribution detection.