论文标题
自我介绍作为实例特定标签平滑
Self-Distillation as Instance-Specific Label Smoothing
论文作者
论文摘要
最近已经证明,多代的自我验证可以改善概括。尽管有这一有趣的观察,但增强的理由仍然很少理解。在本文中,我们首先在实验上证明,多代自依据的提高性能的改善与教师预测的多样性的增加有关。考虑到这一点,我们为摊销的地图估计提供了一种新的解释,以使教师预测实现特定于实例的正则化。我们的框架使我们从理论上将自我鉴定与标签平滑相关联,这是一种常规的技术,可以使预测性不确定性正常,并提出了除预测不确定性外,预测性多样性的重要性。我们使用多个数据集和神经网络体系结构提出了实验结果,总体而言,这些数据集证明了预测多样性的实用性。最后,我们提出了一种新颖的实例标签平滑技术,该技术可以促进预测性多样性,而无需单独训练有素的教师模型。我们对所提出的方法提供了经验评估,我们发现该方法通常优于经典标签平滑。
It has been recently demonstrated that multi-generational self-distillation can improve generalization. Despite this intriguing observation, reasons for the enhancement remain poorly understood. In this paper, we first demonstrate experimentally that the improved performance of multi-generational self-distillation is in part associated with the increasing diversity in teacher predictions. With this in mind, we offer a new interpretation for teacher-student training as amortized MAP estimation, such that teacher predictions enable instance-specific regularization. Our framework allows us to theoretically relate self-distillation to label smoothing, a commonly used technique that regularizes predictive uncertainty, and suggests the importance of predictive diversity in addition to predictive uncertainty. We present experimental results using multiple datasets and neural network architectures that, overall, demonstrate the utility of predictive diversity. Finally, we propose a novel instance-specific label smoothing technique that promotes predictive diversity without the need for a separately trained teacher model. We provide an empirical evaluation of the proposed method, which, we find, often outperforms classical label smoothing.