一种用于加速深神网络优化的硬件感知系统

论文标题

一种用于加速深神网络优化的硬件感知系统

A Hardware-Aware System for Accelerating Deep Neural Network Optimization

论文作者

Sarah, Anthony, Cummings, Daniel, Sridhar, Sharath Nittur, Sundaresan, Sairam, Szankin, Maciej, Webb, Tristan, Munoz, J. Pablo

论文摘要

从硬件 - 不合理的“ super-network”中提取专门的硬件感知配置（又称“子网络”）的神经体系结构搜索（NAS）的最新进展已变得越来越流行。尽管已经为改进第一阶段而采取了巨大努力，即对超级网络的培训，但寻找衍生性高性能的子网络的搜索仍在很大程度上尚未探索。例如，一些最近的网络形态技术允许对超级网络进行一次训练，然后根据需要从其中提取特定于硬件的网络。这些方法将超级网络搜索的超级网络培训解除，从而减少了专门针对不同硬件平台的计算负担。我们提出了一个全面的系统，该系统会自动有效地从预先训练的超级网络中找到子网络，该系统被优化为不同的性能指标和硬件配置。通过将新颖的搜索策略和算法与智能使用预测变量相结合，我们大大减少了从给定的超级网络中找到最佳子网所需的时间。此外，我们的方法不需要先验目标任务的超级网络，从而可以与任何超级网络接口。我们通过广泛的实验证明，我们的系统可以与多个领域中现有的最新超级网络培训方法无缝配合。此外，我们展示了与进化算法配对的新颖搜索策略如何加速Resnet50，MobilenetV3和变压器的搜索过程，同时保持客观空间帕累托前部的多样性，并比最先进的贝叶斯优化弱弱方法更快地证明了8倍的搜索结果。

Recent advances in Neural Architecture Search (NAS) which extract specialized hardware-aware configurations (a.k.a. "sub-networks") from a hardware-agnostic "super-network" have become increasingly popular. While considerable effort has been employed towards improving the first stage, namely, the training of the super-network, the search for derivative high-performing sub-networks is still largely under-explored. For example, some recent network morphism techniques allow a super-network to be trained once and then have hardware-specific networks extracted from it as needed. These methods decouple the super-network training from the sub-network search and thus decrease the computational burden of specializing to different hardware platforms. We propose a comprehensive system that automatically and efficiently finds sub-networks from a pre-trained super-network that are optimized to different performance metrics and hardware configurations. By combining novel search tactics and algorithms with intelligent use of predictors, we significantly decrease the time needed to find optimal sub-networks from a given super-network. Further, our approach does not require the super-network to be refined for the target task a priori, thus allowing it to interface with any super-network. We demonstrate through extensive experiments that our system works seamlessly with existing state-of-the-art super-network training methods in multiple domains. Moreover, we show how novel search tactics paired with evolutionary algorithms can accelerate the search process for ResNet50, MobileNetV3 and Transformer while maintaining objective space Pareto front diversity and demonstrate an 8x faster search result than the state-of-the-art Bayesian optimization WeakNAS approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题