论文标题
使用APEX在HPX中分布式,CPU和GPU分析组合
Distributed, combined CPU and GPU profiling within HPX using APEX
论文作者
论文摘要
基准测试和比较跨硬件平台的科学模拟的性能是一项复杂的任务。当所涉及的模拟是用异步,多任务(AMT)运行时卸载工作构建的,该任务变得更加复杂。在本文中,我们讨论了一个独特的性能测量库Apex的使用,以捕获基于HPX建立的模拟的性能行为,这是一个高度可扩展的,分布式的AMT运行时。我们研究了Octo-Tiger对两个不同超级计算体系结构进行的天体物理模拟的性能。我们分析了缩放和测量开销的结果。此外,我们深入探讨了两个系统上两个类似配置的执行,以研究建筑差异如何影响性能并确定优化的机会。作为一个这样的机会,我们优化了Hydro求解器的通信,并研究了其性能影响。
Benchmarking and comparing performance of a scientific simulation across hardware platforms is a complex task. When the simulation in question is constructed with an asynchronous, many-task (AMT) runtime offloading work to GPUs, the task becomes even more complex. In this paper, we discuss the use of a uniquely suited performance measurement library, APEX, to capture the performance behavior of a simulation built on HPX, a highly scalable, distributed AMT runtime. We examine the performance of the astrophysics simulation carried-out by Octo-Tiger on two different supercomputing architectures. We analyze the results of scaling and measurement overheads. In addition, we look in-depth at two similarly configured executions on the two systems to study how architectural differences affect performance and identify opportunities for optimization. As one such opportunity, we optimize the communication for the hydro solver and investigated its performance impact.