论文标题
高山:模拟内存加速度,并进行紧密的处理器集成以进行深度学习
ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning
论文作者
论文摘要
模拟内存计算(AIMC)核心在数字逻辑(例如CPU)方面为神经网络推断提供了显着的性能和能量益处。 AIMC加速了矩阵矢量乘法,该乘数主导了这些应用程序的运行时间。但是,以AIMC为中心的平台缺乏通用系统的灵活性,因为它们通常具有硬编码的数据流,并且只能支持有限的处理功能。为了弥合灵活性差距,我们提出了一种新型的系统体系结构,该系统结构将模拟内存计算加速器紧密地集成到通用系统中的多核CPU中。我们将基于GEM5的功能强大的完整系统级仿真框架开发到GEM5-X模拟器Alpine,该框架可以对所提出的体系结构进行深入的表征。 Alpine允许模拟整个计算机架构堆栈,从主要硬件组件到与Linux OS的交互。在Alpine中,我们定义了自定义ISA扩展名和软件库,以促进推理模型的部署。我们展示和分析了不同神经网络类型的各种映射,并且相对于支持SIMD的ARM CPU实施了卷积神经网络,多层perceptrons和复发性神经网络,最高20.5倍/20.8倍的性能/能量增长。
Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neural network inference with respect to digital logic (e.g., CPUs). AIMCs accelerate matrix-vector multiplications, which dominate these applications' run-time. However, AIMC-centric platforms lack the flexibility of general-purpose systems, as they often have hard-coded data flows and can only support a limited set of processing functions. With the goal of bridging this gap in flexibility, we present a novel system architecture that tightly integrates analog in-memory computing accelerators into multi-core CPUs in general-purpose systems. We developed a powerful gem5-based full system-level simulation framework into the gem5-X simulator, ALPINE, which enables an in-depth characterization of the proposed architecture. ALPINE allows the simulation of the entire computer architecture stack from major hardware components to their interactions with the Linux OS. Within ALPINE, we have defined a custom ISA extension and a software library to facilitate the deployment of inference models. We showcase and analyze a variety of mappings of different neural network types, and demonstrate up to 20.5x/20.8x performance/energy gains with respect to a SIMD-enabled ARM CPU implementation for convolutional neural networks, multi-layer perceptrons, and recurrent neural networks.