论文标题
网络内积累:扩展NOC在DNN加速度方面的作用
In-Network Accumulation: Extending the Role of NoC for DNN Acceleration
论文作者
论文摘要
网络芯片(NOC)在DNN加速器的性能中起着重要作用。 NOC的可扩展性和模块化设计属性通过在运行各种工作负载方面提供灵活性来有助于提高DNN执行的性能。对于DNN工作负载中的数据运动仍然是DNN加速器的一项艰巨任务,因此需要采用新颖的方法。在本文中,我们提出了网络内的积累(INA)方法,以进一步加速DNN工作负载在多核空间DNN加速器上进行重量固定(WS)数据流模型的执行。 INA方法扩展了路由器的函数以支持部分总和。此方法避免了向本地处理元素注射和弹出输入的部分总和的开销。 Alexnet,Resnet-50和VGG-16工作负载上的仿真结果表明,拟议的INA方法可在WS DataFlow模型上提高延迟1.22倍,并提高2.16倍的功耗。
Network-on-Chip (NoC) plays a significant role in the performance of a DNN accelerator. The scalability and modular design property of the NoC help in improving the performance of a DNN execution by providing flexibility in running different kinds of workloads. Data movement in a DNN workload is still a challenging task for DNN accelerators and hence a novel approach is required. In this paper, we propose the In-Network Accumulation (INA) method to further accelerate a DNN workload execution on a many-core spatial DNN accelerator for the Weight Stationary (WS) dataflow model. The INA method expands the router's function to support partial sum accumulation. This method avoids the overhead of injecting and ejecting an incoming partial sum to the local processing element. The simulation results on AlexNet, ResNet-50, and VGG-16 workloads show that the proposed INA method achieves 1.22x improvement in latency and 2.16x improvement in power consumption on the WS dataflow model.