高性能分布的RMA锁

论文标题

高性能分布的RMA锁

High-Performance Distributed RMA Locks

论文作者

Schmid, Patrick, Besta, Maciej, Hoefler, Torsten

论文摘要

我们提出了一个拓扑意识到的分布式读取器锁定锁，可加速超级计算机和数据中心的不规则工作负载。锁背后的核心思想是一个模块化设计，它是三个分布式数据结构的相互作用：临界部分中的读者/作家的计数器，一组用于订购锁定锁定的作家的队列，以及一棵树，将所有队列绑定并与读者同步。每个结构都与一个参数相关联，用于偏爱读者或作家，从而可以将可调节性能视为三维参数空间中的点。我们还开发了分布式拓扑感知的MCS锁，该锁定是上述设计的基础，并改善了最新的MPI实现。两种方案都使用非阻滞遥控器访问（RMA）技术，以获得最高性能和可伸缩性。我们在Cray XC30上评估我们的方案，并说明它们的表现分别超过了最先进的MPI-3 RMA锁定协议，分别为81％和73％。最后，我们使用它们来加速一个分布式的散布，该分布式表示不规则的工作负载，例如钥匙值商店或图形处理。

We propose a topology-aware distributed Reader-Writer lock that accelerates irregular workloads for supercomputers and data centers. The core idea behind the lock is a modular design that is an interplay of three distributed data structures: a counter of readers/writers in the critical section, a set of queues for ordering writers waiting for the lock, and a tree that binds all the queues and synchronizes writers with readers. Each structure is associated with a parameter for favoring either readers or writers, enabling adjustable performance that can be viewed as a point in a three dimensional parameter space. We also develop a distributed topology-aware MCS lock that is a building block of the above design and improves state-of-the-art MPI implementations. Both schemes use non-blocking Remote Memory Access (RMA) techniques for highest performance and scalability. We evaluate our schemes on a Cray XC30 and illustrate that they outperform state-of-the-art MPI-3 RMA locking protocols by 81% and 73%, respectively. Finally, we use them to accelerate a distributed hashtable that represents irregular workloads such as key-value stores or graph processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题