论文标题
TAPA:用于现代FPGA的可扩展任务 - 平行数据流编程框架,并与HLS合作和物理设计
TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design
论文作者
论文摘要
在本文中,我们提出了TAPA,TAPA是一种端到端框架,将C ++任务并行数据流程序编译为高频FPGA加速器。与现有解决方案相比,TAPA具有两个主要优势。首先,TAPA提供了一组方便的API,使用户可以轻松表达灵活且复杂的任务跨态通信结构。其次,TAPA在HLS汇编过程中采用粗粒地平面步骤,以准确地进行潜在的临界路径。此外,TAPA实施了针对现代基于HBM的FPGA专门定制的几种优化技术。在总共43种设计的实验中,我们将平均频率从147 MHz提高到297 MHz(提高102%),而没有吞噬吞吐量损失,资源利用率的变化却无可忽视。值得注意的是,在16个实验中,我们平均使最初不可能的设计达到274 MHz。该框架可从https://github.com/ucla-vast/tapa获得,核心地板模块可在https://github.com/ucla-vast/autobridge上找到。
In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allow users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning step during HLS compilation for accurate pipelining of potential critical paths. In addition, TAPA implements several optimization techniques specifically tailored for modern HBM-based FPGAs. In our experiments with a total of 43 designs, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average. The framework is available at https://github.com/UCLA-VAST/tapa and the core floorplan module is available at https://github.com/UCLA-VAST/AutoBridge.