具有知识蒸馏的多保真神经建筑搜索

论文标题

具有知识蒸馏的多保真神经建筑搜索

Multi-fidelity Neural Architecture Search with Knowledge Distillation

论文作者

Trofimov, Ilya, Klyuchnikov, Nikita, Salnikov, Mikhail, Filippov, Alexander, Burnaev, Evgeny

论文摘要

神经体系结构搜索（NAS）的目标是为问题或问题家庭找到神经网络的最佳体系结构。神经体系结构的评估非常耗时。减轻此问题的可能方法之一是使用低保真评估，即对数据集的一部分进行培训，较少的时期，频道较少等。在本文中，我们提出了一种用于神经建筑搜索的贝叶斯多效率方法：MF-KD。该方法依赖于通过使用知识蒸馏的几个时代训练对神经体系结构进行低保真评估的新方法。知识蒸馏增加了损失函数一个术语，迫使网络模仿某些教师网络。我们在CIFAR-10，CIFAR-100和Imagenet-16-120上进行实验。我们表明，与对逻辑损失的几个时期的训练相比，对几个具有这样修改的损失功能的训练会导致更好的神经体系结构选择。所提出的方法的表现优于几个最先进的基线。

Neural architecture search (NAS) targets at finding the optimal architecture of a neural network for a problem or a family of problems. Evaluations of neural architectures are very time-consuming. One of the possible ways to mitigate this issue is to use low-fidelity evaluations, namely training on a part of a dataset, fewer epochs, with fewer channels, etc. In this paper, we propose a bayesian multi-fidelity method for neural architecture search: MF-KD. The method relies on a new approach to low-fidelity evaluations of neural architectures by training for a few epochs using a knowledge distillation. Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We carry out experiments on CIFAR-10, CIFAR-100, and ImageNet-16-120. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss. The proposed method outperforms several state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题