论文标题

(poly)对数的时间构建圆形$ n $ n $ block广播时间表,用于广播和MPI中的不规则allgather

(Poly)Logarithmic Time Construction of Round-optimal $n$-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI

论文作者

Träff, Jesper Larsson

论文摘要

我们为最佳通信时间表的快速(ER)提供了快速的(ER),无通信的并行构造,以允许$ N $不同的数据块从根处理器到所有其他处理器,以$ 1 $ ported,$ p $ - $ P $ - 过程 - 流行过程,并具有完全双向通信。对于任何$ p $和$ n $,此模型中的广播都需要$ n-1+\ lceil \ log_2 p \ rceil $通信回合。与其他结构相比,所有处理器都遵循相同的循环图形通信模式,这使得可以使用Allgather(全范围)操作的时间表。新构造需要$ O(\ log^3 p)$每个处理器,每个处理器都可以独立于$ O(\ log P)$空间以其他处理器独立计算其时间表的一部分。结果是对顺序$ o的显着改善(p \ log^2 p)$时间和$ o(p \ log p)$träffand Ripke(2009)的太空建设,其实用性非常实用。 The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges.不规则的Allgather操作的应用是全新的。

We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of $n$ distinct blocks of data from a root processor to all other processors in $1$-ported, $p$-processor networks with fully bidirectional communication. For any $p$ and $n$, broadcasting in this model requires $n-1+\lceil\log_2 p\rceil$ communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes $O(\log^3 p)$ time steps per processor, each of which can compute its part of the schedule independently of the other processors in $O(\log p)$ space. The result is a significant improvement over the sequential $O(p \log^2 p)$ time and $O(p\log p)$ space construction of Träff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源