论文标题

网络分区和可避免的争论

Network Partitioning and Avoidable Contention

论文作者

Oltchik, Yishai, Schwartz, Oded

论文摘要

网络争议经常主导并行算法的运行时间,并限制缩放性能。大多数以前的研究通过利用几种方法之一来减轻或消除争执:交流最小化算法;避开热点的路由方案;拓扑感知的任务映射;或改善全球网络属性,例如一分配带宽,边缘扩张,分区和网络直径。实际上,并行作业通常只使用主机系统的一小部分。处理器分配政策如何影响分区中的争论?我们利用网络图的边缘相容分析来确定网络分区是否具有最佳的内部分配。增加二聚式可以更有效地利用网络资源,减少或完全消除链接的争论。我们首先研究圆环网络,并表征最大化内部一分子带宽的分区几何形状。我们研究了Mira和Juqueen的分配策略,这是两个最大的公共蓝色基因/基于Q的超级计算机。我们的分析表明,通常可以通过更改分区的几何形状来改善其当前分区的一分配带宽。这些可以产生高达X2的加速,以实现争夺的工作负载。基准测试实验验证了预测。我们的分析适用于其他网络的分配策略。

Network contention frequently dominates the run time of parallel algorithms and limits scaling performance. Most previous studies mitigate or eliminate contention by utilizing one of several approaches: communication-minimizing algorithms; hotspot-avoiding routing schemes; topology-aware task mapping; or improving global network properties, such as bisection bandwidth, edge-expansion, partitioning, and network diameter. In practice, parallel jobs often use only a fraction of a host system. How do processor allocation policies affect contention within a partition? We utilize edge-isoperimetric analysis of network graphs to determine whether a network partition has optimal internal bisection. Increasing the bisection allows a more efficient use of the network resources, decreasing or completely eliminating the link contention. We first study torus networks and characterize partition geometries that maximize internal bisection bandwidth. We examine the allocation policies of Mira and JUQUEEN, the two largest publicly-accessible Blue Gene/Q torus-based supercomputers. Our analysis demonstrates that the bisection bandwidth of their current partitions can often be improved by changing the partitions' geometries. These can yield up to a X2 speedup for contention-bound workloads. Benchmarking experiments validate the predictions. Our analysis applies to allocation policies of other networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源