论文标题

残留知识蒸馏

Residual Knowledge Distillation

论文作者

Gao, Mengya, Shen, Yujun, Li, Quanquan, Loy, Chen Change

论文摘要

知识蒸馏(KD)是模型压缩的最有效方法之一。关键的想法是将知识从深层的教师模型(T)转移到较浅的学生。但是,由于S和T的学习能力之间存在很大的差距,现有方法遭受了性能降解。为了解决此问题,这项工作提出了剩余的知识蒸馏(RKD),这通过引入助手(a)进一步提炼知识。具体而言,S训练S可以模仿T的特征图,A通过学习它们之间的残余误差来帮助此过程。通过这种方式,S和A相互补充以从T.中获得更好的知识。此外,我们设计了一种有效的方法来从给定模型中得出S和A,而无需增加总计算成本。广泛的实验表明,我们的方法在流行的分类数据集(CIFAR-100和Imagenet)中取得了吸引力的结果,超过了最先进的方法。

Knowledge distillation (KD) is one of the most potent ways for model compression. The key idea is to transfer the knowledge from a deep teacher model (T) to a shallower student (S). However, existing methods suffer from performance degradation due to the substantial gap between the learning capacities of S and T. To remedy this problem, this work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A). Specifically, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them. In this way, S and A complement with each other to get better knowledge from T. Furthermore, we devise an effective method to derive S and A from a given model without increasing the total computational cost. Extensive experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet, surpassing state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源