使用一种镜头模型来发展神经结构

论文标题

使用一种镜头模型来发展神经结构

Evolving Neural Architecture Using One Shot Model

论文作者

Sinha, Nilotpal, Chen, Kuan-Wen

论文摘要

神经体系结构搜索（NAS）正在成为一个新的研究方向，该方向有可能取代为特定任务设计的手工制作的神经体系结构。以前的基于进化的架构搜索需要高度的计算资源，从而导致较高的搜索时间。在这项工作中，我们提出了一种新颖的方式，将简单的遗传算法应用于称为EVNA的NAS问题（使用一个Shot模型进化的神经体系结构），该方法可显着降低搜索时间，同时仍然比以前基于进化的方法获得更好的结果。架构是通过使用一个Shot模型的体系结构参数表示的，该参数导致给定体系结构群体的体系结构之间的权重共享以及从一代到下一代体系结构的权重继承。我们为体系结构参数提出了一种解码技术，该技术用于将大多数梯度信息转移到给定的体系结构上，并用于在搜索过程中从一个Shot模型中改善给定体系结构的性能预测。此外，我们将部分训练的架构的准确性在验证数据上作为其适合度的预测，以减少搜索时间。 EVNA在单个GPU上搜索代理数据集中的体系结构，即在4.4 GPU天，并获得3.63万参数的TOP-1测试误差，然后将其转移到CIFAR-100和ImagEnet的TOP-1误差为16.37％和TOP-5错误，分别为7.4％。4.4％。所有这些结果表明，进化方法在解决架构搜索问题方面的潜力。

Neural Architecture Search (NAS) is emerging as a new research direction which has the potential to replace the hand-crafted neural architectures designed for specific tasks. Previous evolution based architecture search requires high computational resources resulting in high search time. In this work, we propose a novel way of applying a simple genetic algorithm to the NAS problem called EvNAS (Evolving Neural Architecture using One Shot Model) which reduces the search time significantly while still achieving better result than previous evolution based methods. The architectures are represented by using the architecture parameter of the one shot model which results in the weight sharing among the architectures for a given population of architectures and also weight inheritance from one generation to the next generation of architectures. We propose a decoding technique for the architecture parameter which is used to divert majority of the gradient information towards the given architecture and is also used for improving the performance prediction of the given architecture from the one shot model during the search process. Furthermore, we use the accuracy of the partially trained architecture on the validation data as a prediction of its fitness in order to reduce the search time. EvNAS searches for the architecture on the proxy dataset i.e. CIFAR-10 for 4.4 GPU day on a single GPU and achieves top-1 test error of 2.47% with 3.63M parameters which is then transferred to CIFAR-100 and ImageNet achieving top-1 error of 16.37% and top-5 error of 7.4% respectively. All of these results show the potential of evolutionary methods in solving the architecture search problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题