论文标题
模板流:将大型模板程序映射到分布式空间计算系统
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems
论文作者
论文摘要
空间计算设备已显示出显着加速模板的计算,但到目前为止,依赖于单个模具操作的迭代维度以增加时间位置。这项工作考虑了将异质模板计算的定向无环图映射到空间计算系统的一般情况,假设大型输入程序没有迭代组件。模板流最大化时间位置,并在这种情况下确保僵局自由,从而提供端到端分析并从高级程序说明到分布式硬件进行映射。我们在Stratix 10 FPGA测试床上评估了生成的架构,分别在单个设备和多设备上产生1.31 TOP/S和4.18 TOP/S,这表明了迄今为止FPGAS上的模板程序记录的最高性能。然后,我们利用该框架从生产天气模拟应用程序中研究复杂的模具程序。我们的工作使具有大型模板程序的分布式空间计算系统可以有效地定位,并深入了解其实践中有效执行所需的架构特征。
Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate our generated architectures on a Stratix 10 FPGA testbed, yielding 1.31 TOp/s and 4.18 TOp/s on single-device and multi-device, respectively, demonstrating the highest performance recorded for stencil programs on FPGAs to date. We then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice.