论文标题

使用独立于体系结构的程序功能来表征对内存访问模式的优化

Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features

论文作者

Chilukuri, Aditya, Milthorpe, Josh, Johnston, Beau

论文摘要

高性能计算开发人员面临着优化对不同体系结构的OpenCL工作负载的挑战。独立于架构的工作负载特征(AIWC)工具是OCLGRIND OPENCL模拟器的插件,该插件可以收集OpenCL程序的指标,可用于在任意给定的硬件体系结构上理解和预测程序性能。但是,AIWC指标并不总是很容易解释,并且不会反映一些影响跨体系结构效率的重要记忆访问模式。我们提出了一个新的并行空间局部性的度量 - 内存的接近度访问由OpenCL工作项目(线程)同时发行。我们在AIWC框架中实现了并行的空间局部性度量,并在矩阵乘法和扩展的OpenDwarfs OpenCL基准上分析了收集的结果。在矩阵乘以实现的平行空间局部性度量标准的差异反映了所执行的优化。新的度量标准可用于根据影响其各种体系结构的性能的内存访问模式来区分OpEndWarfs基准。 AIWC建议的改进将帮助HPC开发人员更好地了解复杂代码的内存访问模式,并指导对任意硬件目标的代码优化。

High-performance computing developers are faced with the challenge of optimizing the performance of OpenCL workloads on diverse architectures. The Architecture-Independent Workload Characterization (AIWC) tool is a plugin for the Oclgrind OpenCL simulator that gathers metrics of OpenCL programs that can be used to understand and predict program performance on an arbitrary given hardware architecture. However, AIWC metrics are not always easily interpreted and do not reflect some important memory access patterns affecting efficiency across architectures. We propose a new metric of parallel spatial locality -- the closeness of memory accesses simultaneously issued by OpenCL work-items (threads). We implement the parallel spatial locality metric in the AIWC framework, and analyse gathered results on matrix multiply and the Extended OpenDwarfs OpenCL benchmarks. The differences in the observed parallel spatial locality metric across implementations of matrix multiply reflect the optimizations performed. The new metric can be used to distinguish between the OpenDwarfs benchmarks based on the memory access patterns affecting their performance on various architectures. The improvements suggested to AIWC will help HPC developers better understand memory access patterns of complex codes and guide optimization of codes for arbitrary hardware targets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源