通过易于匹配的学习策略的动态无数据知识蒸馏

论文标题

通过易于匹配的学习策略的动态无数据知识蒸馏

Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy

论文作者

Li, Jingru, Zhou, Sheng, Li, Liangcheng, Wang, Haishuai, Yu, Zhi, Bu, Jiajun

论文摘要

无数据知识蒸馏（DFKD）是一种知识蒸馏（KD）的广泛使用的策略，其训练数据不可用。它借助较大的审核教师模型来训练轻巧的学生模型，而无需访问培训数据。但是，现有的DFKD方法遭受了不足和不稳定的培训过程，因为它们在学习过程中不会根据学生模型的状态动态调整一代目标。为了解决这一限制，我们提出了一种新型的DFKD方法，称为cudfkd。它通过一种动态策略来教学学生，该策略逐渐生成易于硬的伪样本，反映了人类的学习方式。此外，CUDFKD根据学生模型的状态动态适应了一代目标。此外，我们提供了对主要化最小化（MM）算法的理论分析，并解释了CUDFKD的收敛性。为了衡量DFKD方法的鲁棒性和保真度，我们提出了两个指标，并且实验表明CUDFKD与所有数据集中的最新性能（SOTA）DFKD方法具有可比性的性能。实验还表明，与其他SOTA DFKD方法相比，我们的CUDFKD具有最快的收敛性和最佳鲁棒性。

Data-free knowledge distillation (DFKD) is a widely-used strategy for Knowledge Distillation (KD) whose training data is not available. It trains a lightweight student model with the aid of a large pretrained teacher model without any access to training data. However, existing DFKD methods suffer from inadequate and unstable training process, as they do not adjust the generation target dynamically based on the status of the student model during learning. To address this limitation, we propose a novel DFKD method called CuDFKD. It teaches students by a dynamic strategy that gradually generates easy-to-hard pseudo samples, mirroring how humans learn. Besides, CuDFKD adapts the generation target dynamically according to the status of student model. Moreover, We provide a theoretical analysis of the majorization minimization (MM) algorithm and explain the convergence of CuDFKD. To measure the robustness and fidelity of DFKD methods, we propose two more metrics, and experiments shows CuDFKD has comparable performance to state-of-the-art (SOTA) DFKD methods on all datasets. Experiments also present that our CuDFKD has the fastest convergence and best robustness over other SOTA DFKD methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题