论文标题
基于FPGA的加速器与芯片多处理器的可扩展轻巧集成
Scalable Light-Weight Integration of FPGA Based Accelerators with Chip Multi-Processors
论文作者
论文摘要
现代多核系统从均质系统迁移到具有基于加速器的计算的异质系统,以克服性能和电力壁的障碍。在这一趋势中,由于FPGA的灵活性出色和设计成本较低,因此基于FPGA的加速器变得越来越有吸引力。在本文中,我们提出了通过网络上芯片(NOC)连接的基于FPGA的多加速器和芯片 - 密码处理器(CMP)之间有效接口的架构支持。分布式数据包接收器和分层数据包发件人旨在保持可扩展性并减少沉重的任务负载下的关键路径延迟。还提出了一种专用的加速器链式机制,以促进促进剂之间的FPGA内数据重复使用,以规避FPGA和处理器之间的高度沟通开销。为了评估所提出的体系结构,使用FPGA原型制作执行具有可编程性支持的完整系统仿真。实验结果表明,所提出的体系结构具有高性能,并且在特征上具有轻巧和可扩展性。
Modern multicore systems are migrating from homogeneous systems to heterogeneous systems with accelerator-based computing in order to overcome the barriers of performance and power walls. In this trend, FPGA-based accelerators are becoming increasingly attractive, due to their excellent flexibility and low design cost. In this paper, we propose the architectural support for efficient interfacing between FPGA-based multi-accelerators and chip-multiprocessors (CMPs) connected through the network-on-chip (NoC). Distributed packet receivers and hierarchical packet senders are designed to maintain scalability and reduce the critical path delay under a heavy task load. A dedicated accelerator chaining mechanism is also proposed to facilitate intra-FPGA data reuse among accelerators to circumvent prohibitive communication overhead between the FPGA and processors. In order to evaluate the proposed architecture, a complete system emulation with programmability support is performed using FPGA prototyping. Experimental results demonstrate that the proposed architecture has high-performance, and is light-weight and scalable in characteristics.