论文标题
Regraph:具有异质管道的启用HBM的FPGA上的缩放图处理
ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines
论文作者
论文摘要
使用FPGA进行有效的图形处理引起了极大的兴趣。最近的内存子系统升级,包括在FPGA中引入HBM有望进一步减轻内存瓶颈。但是,现代的多通道HBM需要更多的处理管道才能充分利用其带宽潜力。现有的设计不能很好地扩展,即使所有其他资源都被完全消耗,也会导致HBM设施的充分利用。在本文中,我们重新检查了图处理工作负载,并发现了处理的多样性。我们还发现,各种各样的工作负载可以很容易地分为两种类型,即密集和稀疏的分区。这促使我们提出了一种资源有效的异质管道体系结构。我们的异质体系结构包括两种类型的管道:几乎没有管道来处理具有良好位置和大管道的密集分区的管道,以处理稀疏分区,而当地极差。与传统的整体管道设计不同,异质管道是针对更具体的内存访问模式量身定制的,因此更轻巧,从而使体系结构在有限的资源中可以更有效地扩展到更有效的规模。此外,我们提出了一种模型引导的任务调度方法,该方法将分区调度为正确的管道类型,生成最有效的管道组合和平衡工作负载。此外,我们开发了一个自动化的开源框架,称为Regraph,该框架可自动化整个开发过程。在资源效率方面,Regraph的表现优于最先进的FPGA加速器,最高可达5.9倍。
The use of FPGAs for efficient graph processing has attracted significant interest. Recent memory subsystem upgrades including the introduction of HBM in FPGAs promise to further alleviate memory bottlenecks. However, modern multi-channel HBM requires much more processing pipelines to fully utilize its bandwidth potential. Existing designs do not scale well, resulting in underutilization of the HBM facilities even when all other resources are fully consumed. In this paper, we re-examined the graph processing workloads and found much diversity in processing. We also found that the diverse workloads can be easily classified into two types, namely dense and sparse partitions. This motivates us to propose a resource-efficient heterogeneous pipeline architecture. Our heterogeneous architecture comprises of two types of pipelines: Little pipelines to process dense partitions with good locality and Big pipelines to process sparse partitions with the extremely poor locality. Unlike traditional monolithic pipeline designs, the heterogeneous pipelines are tailored for more specific memory access patterns, and hence are more lightweight, allowing the architecture to scale up to more effectively with limited resources. In addition, we propose a model-guided task scheduling method that schedules partitions to the right pipeline types, generates the most efficient pipeline combination and balances workloads. Furthermore, we develop an automated open-source framework, called ReGraph, which automates the entire development process. ReGraph outperforms state-of-the-art FPGA accelerators by up to 5.9 times in terms of performance and 12times in terms of resource efficiency.