论文标题

AN5D:用于GPU的高度时间阻滞的自动模具框架

AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

论文作者

Matsumura, Kazuaki, Zohouri, Hamid Reza, Wahib, Mohamed, Endo, Toshio, Matsuoka, Satoshi

论文摘要

模具计算是高性能计算应用中最广泛使用的计算模式之一。已经提出了空间和时间阻滞,以通过将内存压力从外部存储器移动到GPU上的芯片内存来克服这种计算的内存结合性质。但是,在考虑GPU的体系结构和内存层次结构以实现高性能的同时,正确实施这些优化是困难的。我们提出了AN5D,一种自动模具框架,能够在给定的C源代码中自动转换和优化模板模式,并生成相应的CUDA代码。我们的框架中的参数调整以我们的性能模型为指导。与现有实现相比,我们的新颖优化策略可减少共享记忆和注册压力,从而使绩效扩展至最高10的时间。我们实现了迄今为止报告的最高绩效,用于在最先进的特斯拉V100 GPU上所有评估的模具基准。

Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving memory pressure from external memory to on-chip memory on GPUs. However, correctly implementing those optimizations while considering the complexity of the architecture and memory hierarchy of GPUs to achieve high performance is difficult. We propose AN5D, an automated stencil framework which is capable of automatically transforming and optimizing stencil patterns in a given C source code, and generating corresponding CUDA code. Parameter tuning in our framework is guided by our performance model. Our novel optimization strategy reduces shared memory and register pressure in comparison to existing implementations, allowing performance scaling up to a temporal blocking degree of 10. We achieve the highest performance reported so far for all evaluated stencil benchmarks on the state-of-the-art Tesla V100 GPU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源