论文标题
APQ:联合搜索网络体系结构,修剪和量化策略
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
论文作者
论文摘要
我们提出APQ,以有效地对资源受限的硬件进行深入学习。与以前分别搜索神经体系结构,修剪政策和量化策略的方法不同,我们以联合方式优化了它们。为了处理它带来的较大的设计空间,一种有希望的方法是训练量化感知的精度预测器,以快速获取量化模型的准确性,并将其馈送到搜索引擎中以选择最佳拟合。但是,训练这种量化的准确性预测指标需要收集大量量化<模型,准确性>对,这涉及量化意识 - 意识到的芬太尼,因此非常耗时。为了应对这一挑战,我们建议将知识从完整精确(即FP32)精度预测器转移到量化意识(即INT8)精度预测器,从而极大地提高了样本效率。此外,收集FP32准确性预测变量的数据集仅需要通过从预定的所有网络中进行抽样来评估神经网络而无需进行任何培训成本,这是高效的。对成像网的广泛实验证明了我们联合优化方法的好处。 APQ具有相同的精度,将潜伏期/能量降低了2x/1.3倍,而Mobilenetv2+Haq。与单独的优化方法(proxylessNAS+AMC+HAQ)相比,APQ的成像网精度提高了2.3%,同时降低了GPU小时和CO2发射的数量级,从而推动了环境友好环境友好的绿色AI的边界。代码和视频公开可用。
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO2 emission, pushing the frontier for green AI that is environmental-friendly. The code and video are publicly available.