知识蒸馏对转移学习的影响

论文标题

知识蒸馏对转移学习的影响

On effects of Knowledge Distillation on Transfer Learning

论文作者

Thapa, Sushil

论文摘要

知识蒸馏是一种流行的机器学习技术，旨在将知识从大型“老师”网络转移到较小的“学生”网络，并通过训练效仿教师来改善学生的表现。近年来，新型蒸馏技术取得了重大进展，这些技术推动了多个问题和基准的性能前沿。大多数报告的工作都侧重于在具体问题上实现最新结果。但是，了解过程以及在某些培训方案下的行为有很大的差距。同样，通过从不同但相关的问题中学到的表示形式，转移学习（TL）是在有限数据集中训练神经网络中的有效技术。尽管它的有效性和受欢迎程度，但对转移学习的知识蒸馏并没有太多探索。在本论文中，我们提出了一种机器学习体系结构，我们称之为TL+KD，将知识蒸馏与转移学习结合在一起。然后，我们在图像分类领域中提出了TL+KD与TL的定量和定性比较。通过这项工作，我们表明，在微调过程中，使用来自较大的教师网络的指导和知识，我们可以改善学生网络以实现更好的验证表演，例如准确性。我们使用各种指标不仅仅是精确得分来表征模型验证性能的改进，并在输入降解等方案中研究了其性能。

Knowledge distillation is a popular machine learning technique that aims to transfer knowledge from a large 'teacher' network to a smaller 'student' network and improve the student's performance by training it to emulate the teacher. In recent years, there has been significant progress in novel distillation techniques that push performance frontiers across multiple problems and benchmarks. Most of the reported work focuses on achieving state-of-the-art results on the specific problem. However, there has been a significant gap in understanding the process and how it behaves under certain training scenarios. Similarly, transfer learning (TL) is an effective technique in training neural networks on a limited dataset faster by reusing representations learned from a different but related problem. Despite its effectiveness and popularity, there has not been much exploration of knowledge distillation on transfer learning. In this thesis, we propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning; we then present a quantitative and qualitative comparison of TL+KD with TL in the domain of image classification. Through this work, we show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy. We characterize the improvement in the validation performance of the model using a variety of metrics beyond just accuracy scores, and study its performance in scenarios such as input degradation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题