BCIM：基于内存中计算的二进制神经网络的有效实施

论文标题

BCIM：基于内存中计算的二进制神经网络的有效实施

BCIM: Efficient Implementation of Binary Neural Network Based on Computation in Memory

论文作者

Zahedi, Mahdi, Shahroodi, Taha, Wong, Stephan, Hamdioui, Said

论文摘要

二进制神经网络（BNN）的应用对于具有严格限制的计算能力的嵌入式系统是有希望的。与带有浮点数据类型的常规神经网络相反，BNN使用二进制的权重和激活，从而减少了内存要求。新兴的非易失性内存设备的备忘录，通过集成存储和计算单元，作为BNN的目标实现平台表现出巨大的潜力。能量和性能的改进主要是由于1）加速矩阵 - 矩阵乘法作为BNN的主要内核，2）减少Von-Neumann架构中的内存瓶颈，3）并带来巨大的平行化。但是，此硬件的效率很大程度上取决于网络在这些设备上的映射和执行方式。在本文中，我们提出了基于XNOR的BNN的有效实现，以最大化并行化，同时使用简单的传感方案来生成激活值。此外，引入了一个新的映射，以最大程度地减少映射到不同Memristor横杆的卷积层之间数据通信的开销。考虑到网络的准确性，基于广泛的分析和基于仿真的分析，以评估不同设计选择的含义。结果表明，与最先进的内存中硬件设计相比，我们的方法可实现高达$ 10 \ times $节省能源$ \ times $延迟。

Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on computing power. Contrary to conventional neural networks with the floating-point datatype, BNNs use binarized weights and activations which additionally reduces memory requirements. Memristors, emerging non-volatile memory devices, show great potential as the target implementation platform for BNNs by integrating storage and compute units. The energy and performance improvements are mainly due to 1) accelerating matrix-matrix multiplication as the main kernel for BNNs, 2) diminishing memory bottleneck in von-Neumann architectures, 3) and bringing massive parallelization. However, the efficiency of this hardware highly depends on how the network is mapped and executed on these devices. In this paper, we propose an efficient implementation of XNOR-based BNN to maximize parallelization while using a simple sensing scheme to generate activation values. Besides, a new mapping is introduced to minimize the overhead of data communication between convolution layers mapped to different memristor crossbars. This comes with extensive analytical and simulation-based analysis to evaluate the implication of different design choices considering the accuracy of the network. The results show that our approach achieves up to $10\times$ energy-saving and $100\times$ improvement in latency compared to the state-of-the-art in-memory hardware design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题