论文标题

TAPA:用于现代FPGA的可扩展任务 - 平行数据流编程框架,并与HLS合作和物理设计

TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design

论文作者

Guo, Licheng, Chi, Yuze, Lau, Jason, Song, Linghao, Tian, Xingyu, Khatti, Moazin, Qiao, Weikang, Wang, Jie, Ustun, Ecenur, Fang, Zhenman, Zhang, Zhiru, Cong, Jason

论文摘要

在本文中,我们提出了TAPA,TAPA是一种端到端框架,将C ++任务并行数据流程序编译为高频FPGA加速器。与现有解决方案相比,TAPA具有两个主要优势。首先,TAPA提供了一组方便的API,使用户可以轻松表达灵活且复杂的任务跨态通信结构。其次,TAPA在HLS汇编过程中采用粗粒地平面步骤,以准确地进行潜在的临界路径。此外,TAPA实施了针对现代基于HBM的FPGA专门定制的几种优化技术。在总共43种设计的实验中,我们将平均频率从147 MHz提高到297 MHz(提高102%),而没有吞噬吞吐量损失,资源利用率的变化却无可忽视。值得注意的是,在16个实验中,我们平均使最初不可能的设计达到274 MHz。该框架可从https://github.com/ucla-vast/tapa获得,核心地板模块可在https://github.com/ucla-vast/autobridge上找到。

In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allow users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning step during HLS compilation for accurate pipelining of potential critical paths. In addition, TAPA implements several optimization techniques specifically tailored for modern HBM-based FPGAs. In our experiments with a total of 43 designs, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average. The framework is available at https://github.com/UCLA-VAST/tapa and the core floorplan module is available at https://github.com/UCLA-VAST/AutoBridge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源