论文标题

FFCNN:卷积神经网络推断的基于FPGA的快速加速度

FFCNN: Fast FPGA based Acceleration for Convolution neural network inference

论文作者

Keddous, F., Nguyen, H-N., Nakib, A.

论文摘要

我们为大规模卷积神经网络提供了一个新的高效OPENCL加速器,称为卷积神经网络(FFCNN)的FPGA快速推断。 FFCNN基于一个深层管道的Opencl内核架构。正如之前指出的那样,高级合成工具(例如OpenCL框架)可以轻松地为CPU和GPU设计的端口代码到FPGA,但是仍然很难在FPGA上有效地运行OpenCL代码。这项工作旨在提出有效的FPGA实施OPENCL高性能计算应用程序。为此,还提出了数据重用和任务映射技术以提高设计效率。此外,在开发FFCNN时考虑了以下动机:1)FFCNN被设计为在基于Intel OpenCL SDK的FPGA设计流程上轻松实现。 2)在FFFCN中,已经集成了不同的技术,以将记忆带和吞吐量提高。在两个深CNN上进行了性能分析,以进行大规模图像分类。获得的结果以及与旨在加速相同类型体系结构的其他作品的比较,通过显着改善的性能和资源利用来显示拟议加速器设计的效率和竞争力。

We present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called Fast Inference on FPGAs for Convolution Neural Network (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As pointed out before, high-level synthesis tools such as the OpenCL framework can easily port codes originally designed for CPUs and GPUs to FPGAs, but it is still difficult to make OpenCL codes run efficiently on FPGAs. This work aims to propose an efficient FPGA implementation of OpenCL High-Performance Computing Applications. To do so, a Data reuse and task mapping techniques are also presented to improve design efficiency. In addition, the following motivations were taken into account when developing FFCNN: 1) FFCNN has been designed to be easily implemented on Intel OpenCL SDK based FPGA design flow. 2) In FFFCN, different techniques have been integrated to improve the memory band with and throughput. A performance analysis is conducted on two deep CNN for Large-Scale Images classification. The obtained results, and the comparison with other works designed to accelerate the same types of architectures, show the efficiency and the competitiveness of the proposed accelerator design by significantly improved performance and resource utilization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源