论文标题

斑点自适应知识蒸馏

Spot-adaptive Knowledge Distillation

论文作者

Song, Jie, Chen, Ying, Ye, Jingwen, Song, Mingli

论文摘要

知识蒸馏(KD)已成为压缩深神经网络的良好范式。进行知识蒸馏的典型方法是在教师网络的监督下训练学生网络,以在教师网络中以一个或多个位置(即层)来利用知识。在整个蒸馏过程中,一旦指定了所有训练样品的蒸馏斑,就不会更改。在这项工作中,我们认为蒸馏斑应该适应训练样品和蒸馏时期。因此,我们提出了一种新的蒸馏策略,称为Spot-Aptaptive KD(SAKD),以在整个蒸馏期间的每个训练迭代中适应每个样品教师网络中的蒸馏点。由于SAKD实际上专注于“在哪里蒸馏”,而不是大多数现有作品广泛研究的“要蒸馏”,因此可以将其无缝整合到现有的蒸馏方法中以进一步提高其性能。进行了10种最先进的蒸馏器的广泛实验,以证明在均质和异质蒸馏设置下,SAKD在改善其蒸馏性能方面的有效性。代码可从https://github.com/zju-vipa/spot-aptive-pytorch获得

Knowledge distillation (KD) has become a well established paradigm for compressing deep neural networks. The typical way of conducting knowledge distillation is to train the student network under the supervision of the teacher network to harness the knowledge at one or multiple spots (i.e., layers) in the teacher network. The distillation spots, once specified, will not change for all the training samples, throughout the whole distillation process. In this work, we argue that distillation spots should be adaptive to training samples and distillation epochs. We thus propose a new distillation strategy, termed spot-adaptive KD (SAKD), to adaptively determine the distillation spots in the teacher network per sample, at every training iteration during the whole distillation period. As SAKD actually focuses on "where to distill" instead of "what to distill" that is widely investigated by most existing works, it can be seamlessly integrated into existing distillation methods to further improve their performance. Extensive experiments with 10 state-of-the-art distillers are conducted to demonstrate the effectiveness of SAKD for improving their distillation performance, under both homogeneous and heterogeneous distillation settings. Code is available at https://github.com/zju-vipa/spot-adaptive-pytorch

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源