知识冷凝蒸馏

论文标题

知识冷凝蒸馏

Knowledge Condensation Distillation

论文作者

Li, Chenxin, Lin, Mingbao, Ding, Zhiyuan, Lin, Nie, Zhuang, Yihong, Huang, Yue, Ding, Xinghao, Cao, Liujuan

论文摘要

知识蒸馏（KD）将知识从高容量的教师网络转移到加强较小的学生。现有方法着重于挖掘知识提示，并将整个知识转移给学生。但是，由于知识在不同的学习阶段向学生表现出不同的价值，因此出现了知识冗余。在本文中，我们提出了知识冷凝蒸馏（KCD）。具体而言，每个样本的知识价值是动态估计的，基于期望最大化（EM）框架迭代地凝结了从教师设置的紧凑知识来指导学生学习。我们的方法很容易在现成的KD方法之上构建，没有额外的培训参数和可忽略的计算开销。因此，它为KD提出了一种新的观点，在该观点中，根据其能力，积极地识别教师知识的学生可以学会更有效，有效地学习。对标准基准测试的实验表明，提出的KCD可以很好地提高学生模型的性能，甚至提高蒸馏效率。代码可在https://github.com/dzy3/kcd上找到。

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher network to strengthen a smaller student. Existing methods focus on excavating the knowledge hints and transferring the whole knowledge to the student. However, the knowledge redundancy arises since the knowledge shows different values to the student at different learning stages. In this paper, we propose Knowledge Condensation Distillation (KCD). Specifically, the knowledge value on each sample is dynamically estimated, based on which an Expectation-Maximization (EM) framework is forged to iteratively condense a compact knowledge set from the teacher to guide the student learning. Our approach is easy to build on top of the off-the-shelf KD methods, with no extra training parameters and negligible computation overhead. Thus, it presents one new perspective for KD, in which the student that actively identifies teacher's knowledge in line with its aptitude can learn to learn more effectively and efficiently. Experiments on standard benchmarks manifest that the proposed KCD can well boost the performance of student model with even higher distillation efficiency. Code is available at https://github.com/dzy3/KCD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题