LL-GNN：高能量物理的FPGA上的低潜伏期神经网络

论文标题

LL-GNN：高能量物理的FPGA上的低潜伏期神经网络

LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

论文作者

Que, Zhiqiang, Fan, Hongxiang, Loo, Marcus, Li, He, Blott, Michaela, Pierini, Maurizio, Tapper, Alexander, Luk, Wayne

论文摘要

这项工作为粒子探测器的低潜伏期神经网络（LL-GNN）设计提供了一种新颖的可重构体系结构，提供了前所未有的低潜伏期性能。将基于FPGA的GNN纳入粒子探测器提出了一个独特的挑战，因为它需要在CERN大型Handron Collider实验的级别1触发器中，在1级触发器中，以每秒数百吨的数据速率部署网络进行在线事件选择。本文提出了一种基于外部产品的新型矩阵乘法方法，该方法通过利用结构化邻接矩阵和列尺度数据布局来增强。此外，引入了融合步骤，以通过消除不必要的界限进一步减少端到端的设计潜伏期。此外，提出了GNN特异性算法 - 硬件共同设计方法，不仅找到了具有更好延迟的设计，而且在给定的延迟约束下发现了高精度设计。为了促进这一点，已经设计和开源了这种低潜伏期GNN硬件体系结构的可自定义模板，从而可以使用高级合成工具来生成低延迟的FPGA设计，并有效地利用资源。评估结果表明，我们的FPGA实施速度高达9.0倍，并且比GPU实施的功率效率高达13.1倍。与以前的FPGA实现相比，这项工作的延迟降低了6.51至16.7倍。此外，我们的FPGA设计的延迟足以使GNN在次级微秒的实时撞机触发系统中部署，从而使其能够从提高的精度中受益。提出的LL-GNN设计通过使复杂的算法有效地处理实验数据来推动下一代触发系统。

This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes per second in the Level-1 triggers at the CERN Large Hadron Collider experiments. This paper proposes a novel outer-product based matrix multiplication approach, which is enhanced by exploiting the structured adjacency matrix and a column-major data layout. Moreover, a fusion step is introduced to further reduce the end-to-end design latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under given latency constraints. To facilitate this, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times higher power efficiency than a GPU implementation. Compared to the previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy. The proposed LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题