论文标题
Nastransfer:分析大规模神经体系结构搜索中的体系结构可转移性
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
论文作者
论文摘要
神经体系结构搜索(NAS)是机器学习中的一个开放且具有挑战性的问题。尽管NAS提供了巨大的希望,但大多数现有NAS方法的过度计算需求使得很难直接在大规模任务上搜索体系结构。进行大规模NAS的典型方法是在小数据集上搜索一个架构构建块(使用大型数据集中的代理集或完全不同的小型数据集),然后将块转移到较大的数据集中。尽管最近有许多结果表明了从代理数据集转移的希望,但尚未解决对研究不同源数据集的影响的不同NAS方法的全面评估。在这项工作中,我们建议通过对ImagEnet1k和Imagenet2k等大型基准进行一系列实验来分析不同NAS方法的结构可传递性。我们发现:(i)代理集的大小和域似乎不会影响目标数据集上的体系结构。平均而言,使用完全不同的小数据集(例如CIFAR10)搜索的体系结构的传输性能与直接在代理目标数据集中搜索的架构相似。但是,代理集的设计对不同NAS方法的排名有很大的影响。 (ii)虽然不同的NAS方法在源数据集(例如CIFAR10)上显示出相似的性能,但它们的传输性能对大数据集(例如Imagenet1k)显着差异。 (iii)即使在大型数据集上,随机抽样基线也非常有竞争力,但是选择代理集和搜索策略的适当组合可以为其提供重大改进。我们认为,我们广泛的经验分析将证明对NAS算法的未来设计有用。
Neural Architecture Search (NAS) is an open and challenging problem in machine learning. While NAS offers great promise, the prohibitive computational demand of most of the existing NAS methods makes it difficult to directly search the architectures on large-scale tasks. The typical way of conducting large scale NAS is to search for an architectural building block on a small dataset (either using a proxy set from the large dataset or a completely different small scale dataset) and then transfer the block to a larger dataset. Despite a number of recent results that show the promise of transfer from proxy datasets, a comprehensive evaluation of different NAS methods studying the impact of different source datasets has not yet been addressed. In this work, we propose to analyze the architecture transferability of different NAS methods by performing a series of experiments on large scale benchmarks such as ImageNet1K and ImageNet22K. We find that: (i) The size and domain of the proxy set does not seem to influence architecture performance on the target dataset. On average, transfer performance of architectures searched using completely different small datasets (e.g., CIFAR10) perform similarly to the architectures searched directly on proxy target datasets. However, design of proxy sets has considerable impact on rankings of different NAS methods. (ii) While different NAS methods show similar performance on a source dataset (e.g., CIFAR10), they significantly differ on the transfer performance to a large dataset (e.g., ImageNet1K). (iii) Even on large datasets, random sampling baseline is very competitive, but the choice of the appropriate combination of proxy set and search strategy can provide significant improvement over it. We believe that our extensive empirical analysis will prove useful for future design of NAS algorithms.