论文标题
朝分布式的2-焦点施加了steiner在数十亿幅图中的最小树木
Towards Distributed 2-Approximation Steiner Minimal Trees in Billion-edge Graphs
论文作者
论文摘要
给定一个边缘加权图和一组已知的种子顶点,网络科学家通常希望理解图形关系以解释种子顶点之间的联系。当种子组为3或更大的施泰纳最小树时 - 最小的无环连接的子图(输入图的)包含所有种子顶点 - 是最短加权路径的有吸引力的概括。通常,计算施泰纳的最小树是NP硬的,但是已经设计并证明了几种多项式时算法可以产生施泰纳树的总重量,其总重量在Steiner最小树的2倍之内。在本文中,我们提出了一个并行的2-符号施加施罐最小树算法及其基于MPI的分布式实现。代替所有成对种子顶点之间的距离计算,这是许多算法中昂贵的阶段,我们的解决方案利用了Voronoi细胞计算。同样,这种方法比其他涉及整个图上涉及最小生成树计算的其他方法具有更高的平行效率。此外,我们的分布式设计利用异步处理和消息优先级方案,以加速距离计算的收敛性,并利用顶点和边缘中心处理以提供快速的时间到解决方案。我们使用具有多达1280亿个边缘和512个计算节点(8K过程)的现实世界图(8K过程)来证明解决方案的可扩展性和性能。我们将解决方案与最先进的steiner最小树求解器,SCIP杰克和两种串行算法进行了比较。我们的解决方案舒适地胜过这些相关作品的图表,并具有1000万边缘,并提供了不错的缩放 - 效率高达90%。我们从经验上表明,平均而言,我们的解决方案识别的施泰纳树的总距离比Steiner最小树高1.0527倍 - 在理论界限内,小于2。
Given an edge-weighted graph and a set of known seed vertices, a network scientist often desires to understand the graph relationships to explain connections between the seed vertices. When the seed set is 3 or larger Steiner minimal tree - min-weight acyclic connected subgraph (of the input graph) that contains all the seed vertices - is an attractive generalization of shortest weighted paths. In general, computing a Steiner minimal tree is NP-hard, but several polynomial-time algorithms have been designed and proven to yield Steiner trees whose total weight is bounded within 2 times the Steiner minimal tree. In this paper, we present a parallel 2-approximation Steiner minimal tree algorithm and its MPI-based distributed implementation. In place of distance computation between all pairs of seed vertices, an expensive phase in many algorithms, our solution exploits Voronoi cell computation. Also, this approach has higher parallel efficiency than others that involve minimum spanning tree computation on the entire graph. Furthermore, our distributed design exploits asynchronous processing and a message prioritization scheme to accelerate convergence of distance computation, and harnesses both vertex and edge centric processing to offer fast time-to-solution. We demonstrate scalability and performance of our solution using real-world graphs with up to 128 billion edges and 512 compute nodes (8K processes). We compare our solution with the state-of-the-art exact Steiner minimal tree solver, SCIP-Jack, and two serial algorithms. Our solution comfortably outperforms these related works on graphs with 10s million edges and offers decent strong scaling - up to 90% efficient. We empirically show that, on average, the total distance of the Steiner tree identified by our solution is 1.0527 times greater than the Steiner minimal tree - well within the theoretical bound of less than equal to 2.