论文标题

针对多个任务的神经体系结构和异质ASIC加速器设计的共同探索

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

论文作者

Yang, Lei, Yan, Zheyu, Li, Meng, Kwon, Hyoukjun, Lai, Liangzhen, Krishna, Tushar, Chandra, Vikas, Jiang, Weiwen, Shi, Yiyu

论文摘要

神经体系结构搜索(NAS)已在各种AI加速平台上展示了​​其功能,例如现场可编程栅极阵列(FPGA)和图形处理单元(GPU)。但是,尽管它们是最强大的AI加速平台,但它仍然是一个开放的问题,即如何将NAS与应用程序特定的集成电路(ASIC)集成。主要的瓶颈来自与ASIC设计相关的大型设计自由。此外,考虑到多个DNN将对具有多种层操作和尺寸的不同工作负载并联运行,从而在一个设计中集成了不同的ASIC子加速器,以实现不同的DNN,可以显着提高性能,同时使设计空间更加复杂。为了应对这些挑战,在本文中,我们根据现有成功的设计构建了ASIC模板集,这些设计由其独特的数据流描述,从而大大降低了设计空间。基于模板,我们进一步提出了一个框架,即鼻腔,该框架可以同时识别多个DNN体系结构和相关的异质ASIC加速器设计,从而可以满足设计规范(规格),同时可以最大程度地提高精度。实验结果表明,与连续的NAS和ASIC设计优化相比,导致设计规格违规的情况下,NASAIC可以保证以17.77%,2.49倍和2.32倍减少延迟,能源和面积以及0.76%的精度损失的结果。据作者所知,这是关于神经架构和ASIC加速器设计共同探索的第一部作品。

Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs). However, it remains an open problem, how to integrate NAS with Application-Specific Integrated Circuits (ASICs), despite them being the most powerful AI accelerating platforms. The major bottleneck comes from the large design freedom associated with ASIC designs. Moreover, with the consideration that multiple DNNs will run in parallel for different workloads with diverse layer operations and sizes, integrating heterogeneous ASIC sub-accelerators for distinct DNNs in one design can significantly boost performance, and at the same time further complicate the design space. To address these challenges, in this paper we build ASIC template set based on existing successful designs, described by their unique dataflows, so that the design space is significantly reduced. Based on the templates, we further propose a framework, namely NASAIC, which can simultaneously identify multiple DNN architectures and the associated heterogeneous ASIC accelerator design, such that the design specifications (specs) can be satisfied, while the accuracy can be maximized. Experimental results show that compared with successive NAS and ASIC design optimizations which lead to design spec violations, NASAIC can guarantee the results to meet the design specs with 17.77%, 2.49x, and 2.32x reductions on latency, energy, and area and with 0.76% accuracy loss. To the best of the authors' knowledge, this is the first work on neural architecture and ASIC accelerator design co-exploration.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源