（poly）对数的时间构建圆形$ n $ n $ block广播时间表，用于广播和MPI中的不规则allgather

论文标题

（poly）对数的时间构建圆形$ n $ n $ block广播时间表，用于广播和MPI中的不规则allgather

(Poly)Logarithmic Time Construction of Round-optimal $n$-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI

论文作者

Träff, Jesper Larsson

论文摘要

我们为最佳通信时间表的快速（ER）提供了快速的（ER），无通信的并行构造，以允许$ N $不同的数据块从根处理器到所有其他处理器，以$ 1 $ ported，$ p $ - $ P $ - 过程 - 流行过程，并具有完全双向通信。对于任何$ p $和$ n $，此模型中的广播都需要$ n-1+\ lceil \ log_2 p \ rceil $通信回合。与其他结构相比，所有处理器都遵循相同的循环图形通信模式，这使得可以使用Allgather（全范围）操作的时间表。新构造需要$ O（\ log^3 p）$每个处理器，每个处理器都可以独立于$ O（\ log P）$空间以其他处理器独立计算其时间表的一部分。结果是对顺序$ o的显着改善（p \ log^2 p）$时间和$ o（p \ log p）$träffand Ripke（2009）的太空建设，其实用性非常实用。 The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges.不规则的Allgather操作的应用是全新的。

We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of $n$ distinct blocks of data from a root processor to all other processors in $1$-ported, $p$-processor networks with fully bidirectional communication. For any $p$ and $n$, broadcasting in this model requires $n-1+\lceil\log_2 p\rceil$ communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes $O(\log^3 p)$ time steps per processor, each of which can compute its part of the schedule independently of the other processors in $O(\log p)$ space. The result is a significant improvement over the sequential $O(p \log^2 p)$ time and $O(p\log p)$ space construction of Träff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.

下载PDF全文

下载文献需遵守相关版权规定

论文标题