从最后一个迷你批次进行一致性正则化的自我介绍

论文标题

从最后一个迷你批次进行一致性正则化的自我介绍

Self-Distillation from the Last Mini-Batch for Consistency Regularization

论文作者

Shen, Yiqing, Xu, Liwu, Yang, Yuzhe, Li, Yaqian, Guo, Yandong

论文摘要

知识蒸馏（KD）通过利用学习的样品级软目标来提高概括能力的强大正规化策略，显示出明显的希望。但是，在现有KD中采用复杂的预训练的教师网络或同伴学生的合奏既耗时又昂贵。已经提出了各种自我KD方法来实现更高的蒸馏效率。但是，它们要么需要额外的网络体系结构修改，要么很难并行化。为了应对这些挑战，我们提出了一个有效且可靠的自我验证框架，该框架从上一个迷你批次（DLB）中称为自distillation。具体而言，我们通过约束每个迷你批次与先前迭代一致的一半来重新排列顺序采样。同时，其余的一半将与即将到来的迭代一致。之后，前一半的迷你批次蒸馏在上一次迭代中产生的即时软目标。我们提出的机制指导训练稳定性和一致性，从而稳健地标记噪声。此外，我们的方法易于实现，而无需进行额外的运行时内存或需要修改模型结构。三个分类基准的实验结果表明，我们的方法可以始终如一地超过不同网络体系结构的最先进的自我鉴定方法。此外，我们的方法通过获得额外的绩效提高来表现出与增强策略的强烈兼容性。该代码可在https://github.com/meta-knowledge-lab/dlb上找到。

Knowledge distillation (KD) shows a bright promise as a powerful regularization strategy to boost generalization ability by leveraging learned sample-level soft targets. Yet, employing a complex pre-trained teacher network or an ensemble of peer students in existing KD is both time-consuming and computationally costly. Various self KD methods have been proposed to achieve higher distillation efficiency. However, they either require extra network architecture modification or are difficult to parallelize. To cope with these challenges, we propose an efficient and reliable self-distillation framework, named Self-Distillation from Last Mini-Batch (DLB). Specifically, we rearrange the sequential sampling by constraining half of each mini-batch coinciding with the previous iteration. Meanwhile, the rest half will coincide with the upcoming iteration. Afterwards, the former half mini-batch distills on-the-fly soft targets generated in the previous iteration. Our proposed mechanism guides the training stability and consistency, resulting in robustness to label noise. Moreover, our method is easy to implement, without taking up extra run-time memory or requiring model structure modification. Experimental results on three classification benchmarks illustrate that our approach can consistently outperform state-of-the-art self-distillation approaches with different network architectures. Additionally, our method shows strong compatibility with augmentation strategies by gaining additional performance improvement. The code is available at https://github.com/Meta-knowledge-Lab/DLB.

下载PDF全文

下载文献需遵守相关版权规定

论文标题