咖啡馆咖啡师：在训练循环中用FPGA酿造咖啡馆

论文标题

咖啡馆咖啡师：在训练循环中用FPGA酿造咖啡馆

Caffe Barista: Brewing Caffe with FPGAs in the Training Loop

论文作者

Vink, Diederik Adriaan, Rajagopal, Aditya, Venieris, Stylianos I., Bouganis, Christos-Savvas

论文摘要

随着深度学习（DL）模型的复杂性的增加，其计算要求相应增加。部署卷积神经网络（CNN）涉及两个阶段：培训和推理。由于推断任务通常在资源约束设备上进行，因此许多研究探讨了自定义硬件加速器的低功率推理领域。另一方面，培训既更加计算和内存密集型，并且主要是在大规模数据中心的渴望型GPU上进行的。 CNN FPGA培训是一个新生的研究领域。这主要是由于缺乏轻松原型和部署各种硬件和/或算法技术来进行发电的CNN培训的工具。这项工作提出了咖啡师，这是一种自动化工具流，可在流行的深度学习框架Caffe中无缝集成到CNN的训练中。据我们所知，这是唯一可以为基于FPGA的CNN培训提供多功能和快速部署硬件和算法的工具，为进一步的研究和开发提供了必要的基础架构。

As the complexity of deep learning (DL) models increases, their compute requirements increase accordingly. Deploying a Convolutional Neural Network (CNN) involves two phases: training and inference. With the inference task typically taking place on resource-constrained devices, a lot of research has explored the field of low-power inference on custom hardware accelerators. On the other hand, training is both more compute- and memory-intensive and is primarily performed on power-hungry GPUs in large-scale data centres. CNN training on FPGAs is a nascent field of research. This is primarily due to the lack of tools to easily prototype and deploy various hardware and/or algorithmic techniques for power-efficient CNN training. This work presents Barista, an automated toolflow that provides seamless integration of FPGAs into the training of CNNs within the popular deep learning framework Caffe. To the best of our knowledge, this is the only tool that allows for such versatile and rapid deployment of hardware and algorithms for the FPGA-based training of CNNs, providing the necessary infrastructure for further research and development.

下载PDF全文

下载文献需遵守相关版权规定

论文标题