Gromacs中分子动力学模拟的异质并行化和加速度

论文标题

Gromacs中分子动力学模拟的异质并行化和加速度

Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS

论文作者

Páll, Szilárd, Zhmurov, Artem, Bauer, Paul, Abraham, Mark, Lundborg, Magnus, Gray, Alan, Hess, Berk, Lindahl, Erik

论文摘要

诸如图形处理单元（GPU）之类的加速器设备的引入对分子动力学模拟产生了深远的影响，并实现了使用商品硬件的速度顺序绩效提高。为了充分收获这些好处，有必要重新重新重新制定一些最基本的算法，包括Verlet列表，配对搜索和截止。在这里，我们介绍了过去十年中Gromacs代码库中实施的分子动力学的异质并行化和加速度设计。该设置涉及对配对列表的一般方法以及有效利用GPU和CPU SIMD加速度的非键合的相互作用的方法，包括在CPU和GPU之间加载平衡任务的能力。为每种类型的硬件调整了算法的工作效率，并且要更有效地使用加速器，我们引入了双对列表，并使用滚动修剪更新。结合新的直接GPU-GPU通信以及GPU的集成，这可以通过在多个GPU和有效的多节点并行化中进行强大缩放来实现单个GPU模拟的出色性能。

The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we present the heterogeneous parallelization and acceleration design of molecular dynamics implemented in the GROMACS codebase over the last decade. The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPUs and CPU SIMD acceleration efficiently, including the ability to load-balance tasks between CPUs and GPUs. The algorithm work efficiency is tuned for each type of hardware, and to use accelerators more efficiently we introduce dual pair lists with rolling pruning updates. Combined with new direct GPU-GPU communication as well as GPU integration, this enables excellent performance from single GPU simulations through strong scaling across multiple GPUs and efficient multi-node parallelization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题