QADAM：帕累托优先性的量化DNN加速器建模

论文标题

QADAM：帕累托优先性的量化DNN加速器建模

QADAM: Quantization-Aware DNN Accelerator Modeling for Pareto-Optimality

论文作者

Inci, Ahmet, Virupaksha, Siri Garudanagiri, Jain, Aman, Thallam, Venkata Vivek, Ding, Ruizhou, Marculescu, Diana

论文摘要

随着机器学习和系统社区努力通过自定义的深神经网络（DNN）加速器，不同的精度或量化水平来实现更高的能量效率，因此需要设计空间探索框架，这些框架将量化的处理元素（PE）纳入加速器设计空间，同时具有准确的和快速的电源，并且具有准确的电源，并且具有准确的功能和快速的功能，以及区域模型。在这项工作中，我们提出了Qadam，这是DNN加速器的高度参数化的量化功率，性能和区域建模框架。我们的框架可以促进DNN加速器的设计空间探索和帕累托效率的未来研究，例如位精度，PE类型，PES的刮擦大小，全球缓冲区大小，总PES的数量和DNN配置。我们的结果表明，不同的精确度和PE类型会导致每个区域和能量性能方面的显着差异。具体而言，我们的框架标识了各种各样的设计点，在该设计点中，每个面积和能量的性能分别变化超过5倍和35倍。我们还表明，提议的轻质处理元件（Lightpes）在准确性和硬件效率方面始终取得帕累托最佳结果。通过提出的框架，我们表明Lightpes取得了PAR的准确性结果，并且与最佳基于INT16的设计相比，每个区域的性能高达5.7倍，能量提高。

As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied bit precision or quantization levels, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements (PE) into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QADAM, a highly parameterized quantization-aware power, performance, and area modeling framework for DNN accelerators. Our framework can facilitate future research on design space exploration and Pareto-efficiency of DNN accelerators for various design choices such as bit precision, PE type, scratchpad sizes of PEs, global buffer size, number of total PEs, and DNN configurations. Our results show that different bit precisions and PE types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5x and 35x, respectively. We also show that the proposed lightweight processing elements (LightPEs) consistently achieve Pareto-optimal results in terms of accuracy and hardware-efficiency. With the proposed framework, we show that LightPEs achieve on par accuracy results and up to 5.7x more performance per area and energy improvement when compared to the best INT16 based design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题