非阻滞同时多线程：拥抱深神经网络的弹性

论文标题

非阻滞同时多线程：拥抱深神经网络的弹性

Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks

论文作者

Shomron, Gil, Weiser, Uri

论文摘要

深度神经网络（DNN）以其无法利用基本硬件资源而闻名，这是由于硬件易感性易于激活和权重。即使在较细的粒度中，许多非零值中有一部分零值位，在硬件上执行时可能会导致效率低下。受传统CPU同时多线程（SMT）的启发，通过在几个线程中共享计算机资源利用来增加计算机资源的利用，我们建议为DNN加速器指定的非阻滞SMT（NB-SMT）。像常规SMT一样，NB-SMT在几个执行流程中共享硬件资源。但是，与SMT不同，NB-SMT是非阻滞，因为它通过利用DNN的算法弹性来处理结构危害。 NB-SMT没有在预订站等待可用的硬件时暂时降低计算精度以一次容纳所有线程，而不是在预订站等待可用的硬件时派遣说明，而不是在机会上派遣说明。我们使用sysmt（启用NB-SMT的输出稳态收缩期阵列（OS-SA））演示了NB-SMT适用性。与传统的OS-SA相比，2个读的SYSMT消耗了1.4倍的区域，并以33％的能源节省提供了2倍的加速，并且使用Imagenet的最先进的CNN的精度降低了1％。一个4线程SYSMT消耗2.5倍的区域并提供3.4倍的加速和39％的能源节省，精度降解为40％的RESNET-18。

Deep neural networks (DNNs) are known for their inability to utilize underlying hardware resources due to hardware susceptibility to sparse activations and weights. Even in finer granularities, many of the non-zero values hold a portion of zero-valued bits that may cause inefficiencies when executed on hardware. Inspired by conventional CPU simultaneous multithreading (SMT) that increases computer resource utilization by sharing them across several threads, we propose non-blocking SMT (NB-SMT) designated for DNN accelerators. Like conventional SMT, NB-SMT shares hardware resources among several execution flows. Yet, unlike SMT, NB-SMT is non-blocking, as it handles structural hazards by exploiting the algorithmic resiliency of DNNs. Instead of opportunistically dispatching instructions while they wait in a reservation station for available hardware, NB-SMT temporarily reduces the computation precision to accommodate all threads at once, enabling a non-blocking operation. We demonstrate NB-SMT applicability using SySMT, an NB-SMT-enabled output-stationary systolic array (OS-SA). Compared with a conventional OS-SA, a 2-threaded SySMT consumes 1.4x the area and delivers 2x speedup with 33% energy savings and less than 1% accuracy degradation of state-of-the-art CNNs with ImageNet. A 4-threaded SySMT consumes 2.5x the area and delivers, for example, 3.4x speedup and 39% energy savings with 1% accuracy degradation of 40%-pruned ResNet-18.

下载PDF全文

下载文献需遵守相关版权规定

论文标题