自我监督的知识蒸馏，用于几次学习

论文标题

自我监督的知识蒸馏，用于几次学习

Self-supervised Knowledge Distillation for Few-shot Learning

论文作者

Rajasegaran, Jathushan, Khan, Salman, Hayat, Munawar, Khan, Fahad Shahbaz, Shah, Mubarak

论文摘要

现实世界中包含大量的大量对象类，一次学习所有这些都是不可行的。很少有射击学习是一个有希望的学习范式，因为它只能使用几个样本快速学习订单分布。最近的作品[7，41]表明，简单地学习一个良好的功能嵌入可以胜过更精致的元学习和度量学习算法，用于几次学习。在本文中，我们提出了一种简单的方法来提高深层神经网络在几乎没有学习任务方面的表现能力。我们遵循两个阶段的学习过程：首先，我们训练神经网络以最大化特征嵌入的熵，从而使用自我监督的辅助损失创建了最佳的输出歧管。在第二阶段，我们通过将自我监督的双胞胎聚集在一起，同时通过学生教师蒸馏来限制歧管，从而最大程度地减少嵌入功能的熵。我们的实验表明，即使在第一阶段，自学的实验也可以超过当前最新方法，而我们的第二阶段蒸馏过程也取得了进一步的收益。我们的代码可在以下网址提供：https：//github.com/brjathu/skd。

Real-world contains an overwhelmingly large number of object classes, learning all of which at once is infeasible. Few shot learning is a promising learning paradigm due to its ability to learn out of order distributions quickly with only a few samples. Recent works [7, 41] show that simply learning a good feature embedding can outperform more sophisticated meta-learning and metric learning algorithms for few-shot learning. In this paper, we propose a simple approach to improve the representation capacity of deep neural networks for few-shot learning tasks. We follow a two-stage learning process: First, we train a neural network to maximize the entropy of the feature embedding, thus creating an optimal output manifold using a self-supervised auxiliary loss. In the second stage, we minimize the entropy on feature embedding by bringing self-supervised twins together, while constraining the manifold with student-teacher distillation. Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process. Our codes are available at: https://github.com/brjathu/SKD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题