论文标题

高性能分布的RMA锁

High-Performance Distributed RMA Locks

论文作者

Schmid, Patrick, Besta, Maciej, Hoefler, Torsten

论文摘要

我们提出了一个拓扑意识到的分布式读取器锁定锁,可加速超级计算机和数据中心的不规则工作负载。锁背后的核心思想是一个模块化设计,它是三个分布式数据结构的相互作用:临界部分中的读者/作家的计数器,一组用于订购锁定锁定的作家的队列,以及一棵树,将所有队列绑定并与读者同步。每个结构都与一个参数相关联,用于偏爱读者或作家,从而可以将可调节性能视为三维参数空间中的点。我们还开发了分布式拓扑感知的MCS锁,该锁定是上述设计的基础,并改善了最新的MPI实现。两种方案都使用非阻滞遥控器访问(RMA)技术,以获得最高性能和可伸缩性。我们在Cray XC30上评估我们的方案,并说明它们的表现分别超过了最先进的MPI-3 RMA锁定协议,分别为81%和73%。最后,我们使用它们来加速一个分布式的散布,该分布式表示不规则的工作负载,例如钥匙值商店或图形处理。

We propose a topology-aware distributed Reader-Writer lock that accelerates irregular workloads for supercomputers and data centers. The core idea behind the lock is a modular design that is an interplay of three distributed data structures: a counter of readers/writers in the critical section, a set of queues for ordering writers waiting for the lock, and a tree that binds all the queues and synchronizes writers with readers. Each structure is associated with a parameter for favoring either readers or writers, enabling adjustable performance that can be viewed as a point in a three dimensional parameter space. We also develop a distributed topology-aware MCS lock that is a building block of the above design and improves state-of-the-art MPI implementations. Both schemes use non-blocking Remote Memory Access (RMA) techniques for highest performance and scalability. We evaluate our schemes on a Cray XC30 and illustrate that they outperform state-of-the-art MPI-3 RMA locking protocols by 81% and 73%, respectively. Finally, we use them to accelerate a distributed hashtable that represents irregular workloads such as key-value stores or graph processing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源