火箭：异质平台上有效且可扩展的全对计算

论文标题

火箭：异质平台上有效且可扩展的全对计算

Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

论文作者

Heldens, Stijn, Hijma, Pieter, van Werkhoven, Ben, Maassen, Jason, Bal, Henri, van Nieuwpoort, Rob

论文摘要

全对计算问题将用户定义的功能应用于给定数据集的两个项目的每个组合。尽管这些问题具有丰富的并行性，但必须利用数据重用以实现良好的性能。一些研究人员认为这个问题，要么通过静态工作分布进行部分复制，要么完全复制动态调度。相比之下，我们提出了一种解决方案，该解决方案依赖于层次的基于多级软件的缓存，以最大程度地利用分布式内存层次结构中每个级别的数据重用，并结合了分裂和争议的方法来利用数据局部性，层次结构工作，从而动态地平衡工作负载，并平衡工作负载，以及异常的处理方法，以最大化资源量化。我们使用不同平台（从台式机到超级计算机）上的三个现实世界应用程序（来自数字取证，本地化显微镜和生物信息学）评估解决方案。结果表明，当缩放到96 GPU时，即使由于分布式缓存而获得超级线性加速度时，结果表现出了出色的效率和可伸缩性。

All-pairs compute problems apply a user-defined function to each combination of two items of a given data set. Although these problems present an abundance of parallelism, data reuse must be exploited to achieve good performance. Several researchers considered this problem, either resorting to partial replication with static work distribution or dynamic scheduling with full replication. In contrast, we present a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization. We evaluate our solution using three real-world applications (from digital forensics, localization microscopy, and bioinformatics) on different platforms (from a desktop machine to a supercomputer). Results shows excellent efficiency and scalability when scaling to 96 GPUs, even obtaining super-linear speedups due to a distributed cache.

下载PDF全文

下载文献需遵守相关版权规定

论文标题