论文标题
多核晶格Boltzmann模拟的跨平台编程模型
Cross-platform programming model for many-core lattice Boltzmann simulations
论文作者
论文摘要
我们提出了一种针对晶格Boltzmann(LB)模拟的新颖的硬件实施策略,该策略在同质和异质的多核平台上产生了巨大的性能。仅基于C ++ 17并行算法,我们的方法不依赖于任何语言扩展,外部库,特定于供应商的代码注释或预编译步骤。特别要感谢最近提出的GPU后端到C ++ 17并行算法,这表明单个代码可以在多核CPU和GPU环境上编译并达到最先进的性能,以解决给定的非琐事流体动力学问题。提出的策略通过六种不同的常用实施方案进行测试,以测试不同平台上内存访问模式的性能影响。测试中包括了九种不同的LB碰撞模型并表现出良好的性能,证明了我们并行方法的多功能性。这项工作表明,由于简洁而通用的LB实施,在研究和生产软件之间进行区分的表现比以往任何时候都要少。结果还强调了现代多核CPU所取得的性能的增长以及它们明显的能力,可以通过传统上更快的GPU平台缩小差距。所有代码均以开源项目“ STLBM”的形式提供给社区,该项目既可以用作独立的仿真软件,又是可重复使用模式的集合,用于加速现有的LB代码。
We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our approach does not rely on any language extensions, external libraries, vendor-specific code annotations, or pre-compilation steps. Thanks in particular to a recently proposed GPU back-end to C++17 Parallel Algorithms, it is shown that a single code can compile and reach state-of-the-art performance on both many-core CPU and GPU environments for the solution of a given non trivial fluid dynamics problem. The proposed strategy is tested with six different, commonly used implementation schemes to test the performance impact of memory access patterns on different platforms. Nine different LB collision models are included in the tests and exhibit good performance, demonstrating the versatility of our parallel approach. This work shows that it is less than ever necessary to draw a distinction between research and production software, as a concise and generic LB implementation yields performances comparable to those achievable in a hardware specific programming language. The results also highlight the gains of performance achieved by modern many-core CPUs and their apparent capability to narrow the gap with the traditionally massively faster GPU platforms. All code is made available to the community in form of the open-source project "stlbm", which serves both as a stand-alone simulation software and as a collection of reusable patterns for the acceleration of pre-existing LB codes.