大规模全频GW计算的GPU加速

论文标题

大规模全频GW计算的GPU加速

GPU Acceleration of Large-Scale Full-Frequency GW Calculations

论文作者

Yu, Victor Wen-zhe, Govoni, Marco

论文摘要

多体扰动理论是一种有力的方法，可以模拟从密度功能理论计算的输出开始的分子和材料中的电子激发。通过有效地实施理论，以便在最新的领导力高性能计算系统上进行大规模运行，可以扩展GW计算的范围。我们提出了一项对西法典中实施的全频GW方法的GPU加速研究。通过（i）优化的GPU库，例如Cufft和Cublas，（ii）一种层次并行化策略，可以最大程度地减少CPU-CPU，CPU-GPU和GPU-GPU数据传输操作，（III）对GPU计算的MPI Commention consecon和（IV），（III）对（III）对（III）选择了cpu comperication concition和（IV）。在领导力高性能计算系统上已经进行了一系列性能基准，显示了与CPU版本有关WEST的GPU加速版本的大幅加速。使用多达25920 GPU证明了良好的强和弱缩放率。最后，我们展示了GPU版本的WEST的能力，用于实现现实系统的大规模全频GW计算，例如纳米结构，接口和缺陷，包括10368年价电子。

Many-body perturbation theory is a powerful method to simulate electronic excitations in molecules and materials starting from the output of density functional theory calculations. By implementing the theory efficiently so as to run at scale on the latest leadership high-performance computing systems it is possible to extend the scope of GW calculations. We present a GPU acceleration study of the full-frequency GW method as implemented in the WEST code. Excellent performance is achieved through the use of (i) optimized GPU libraries, e.g., cuFFT and cuBLAS, (ii) a hierarchical parallelization strategy that minimizes CPU-CPU, CPU-GPU, and GPU-GPU data transfer operations, (iii) nonblocking MPI communications that overlap with GPU computations, and (iv) mixed-precision in selected portions of the code. A series of performance benchmarks have been carried out on leadership high-performance computing systems, showing a substantial speedup of the GPU-accelerated version of WEST with respect to its CPU version. Good strong and weak scaling is demonstrated using up to 25920 GPUs. Finally, we showcase the capability of the GPU version of WEST for large-scale, full-frequency GW calculations of realistic systems, e.g., a nanostructure, an interface, and a defect, comprising up to 10368 valence electrons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题