论文标题
完全平行的卷积神经网络硬件
Fully-parallel Convolutional Neural Network Hardware
论文作者
论文摘要
由于物联网的普及不断增加(IoT),因此一个新的跨学科知识领域,边缘人工智能或边缘智能开始引起机器学习社区的极大兴趣。不幸的是,将AI特征纳入边缘计算设备,这表明了诸如卷积神经网络(CNN)之类的典型机器学习技术的力量和区域的缺点。在这项工作中,我们提出了一种基于随机计算(SC)系统中相关现象的利用,用于在硬件中实现关节神经网络(ANN)的新功能和区域架构。该目的的体系结构可以解决SC对CNN应用程序提出的困难实施挑战,例如在信号之间不希望的相关性以及随机最大函数实现中所产生的二进制式构成转换中使用的高资源。与传统的二元逻辑实现相比,实验结果显示,对于FPGA实施,在速度性能和能源效率方面的提高了19.6倍和6.3倍。我们还意识到了所提出的SC-CNN体系结构的完整VLSI实现,表明我们的优化在可比的技术节点中比以前的SC-DNN体系结构实现了18倍面积。与传统的二进制逻辑实现相反,首次将全平行的CNN作为LENET-5进行嵌入并测试,显示了将随机计算用于嵌入式应用程序的好处。
A new trans-disciplinary knowledge area, Edge Artificial Intelligence or Edge Intelligence, is beginning to receive a tremendous amount of interest from the machine learning community due to the ever increasing popularization of the Internet of Things (IoT). Unfortunately, the incorporation of AI characteristics to edge computing devices presents the drawbacks of being power and area hungry for typical machine learning techniques such as Convolutional Neural Networks (CNN). In this work, we propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware, based on the exploitation of correlation phenomenon in Stochastic Computing (SC) systems. The architecture purposed can solve the difficult implementation challenges that SC presents for CNN applications, such as the high resources used in binary-tostochastic conversion, the inaccuracy produced by undesired correlation between signals, and the stochastic maximum function implementation. Compared with traditional binary logic implementations, experimental results showed an improvement of 19.6x and 6.3x in terms of speed performance and energy efficiency, for the FPGA implementation. We have also realized a full VLSI implementation of the proposed SC-CNN architecture demonstrating that our optimization achieve a 18x area reduction over previous SC-DNN architecture VLSI implementation in a comparable technological node. For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA, showing the benefits of using stochastic computing for embedded applications, in contrast to traditional binary logic implementations.