GPU上的平行顶点覆盖算法

论文标题

GPU上的平行顶点覆盖算法

Parallel Vertex Cover Algorithms on GPUs

论文作者

Yamout, Peter, Barada, Karim, Jaljuli, Adnan, Mouawad, Amer E., Hajj, Izzat El

论文摘要

储层计算是预测湍流的有力工具，其简单的架构具有处理大型系统的计算效率。然而，其实现通常需要完整的状态向量测量和系统非线性知识。我们使用非线性投影函数将系统测量扩展到高维空间，然后将其输入到储层中以获得预测。我们展示了这种储层计算网络在时空混沌系统上的应用，该系统模拟了湍流的若干特征。我们表明，使用径向基函数作为非线性投影器，即使只有部分观测并且不知道控制方程，也能稳健地捕捉复杂的系统非线性。最后，我们表明，当测量稀疏、不完整且带有噪声，甚至控制方程变得不准确时，我们的网络仍然可以产生相当准确的预测，从而为实际湍流系统的无模型预测铺平了道路。

Finding small vertex covers in a graph has applications in numerous domains. Two common formulations of the problem include: Minimum Vertex Cover, which finds the smallest vertex cover in a graph, and Parameterized Vertex Cover, which finds a vertex cover whose size is less than or equal to some parameter $k$. Algorithms for both formulations traverse a search tree, which grows exponentially with the size of the graph or the value of $k$. Parallelizing the traversal of the vertex cover search tree on GPUs is challenging for multiple reasons. First, the search tree is a narrow binary tree which makes it difficult to extract enough sub-trees to process in parallel to fully utilize the GPU's resources. Second, the search tree is highly imbalanced which makes load balancing across a massive number of parallel GPU workers challenging. Third, keeping around all the intermediate state needed to traverse many sub-trees in parallel puts high pressure on the GPU's memory resources and may act as a limiting factor to parallelism. To address these challenges, we propose an approach to traverse the vertex cover search tree in parallel using GPUs while handling dynamic load balancing. Each thread block traverses a different sub-tree using a local stack, however, we also use a global worklist to balance load. Blocks contribute branches of their sub-trees to the global worklist on an as-needed basis, while blocks that finish their sub-trees get new ones from the global worklist. We use degree arrays to represent intermediate graphs so that the representation is compact in memory to avoid limiting parallelism, but self-contained which is necessary for load balancing. Our evaluation shows that compared to prior work, our hybrid approach of using local stacks and a global worklist substantially improves performance and reduces load imbalance, especially on difficult instances of the problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题