通过在GPU上实现管道实现解决动态编程问题

论文标题

通过在GPU上实现管道实现解决动态编程问题

Solving Dynamic Programming Problem by Pipeline Implementation on GPU

论文作者

Matsumae, Susumu, Miyazaki, Makoto

论文摘要

在本文中，我们显示了GPU上动态编程（DP）的管道实现的有效性。例如，我们解释了如何通过DP在GPU上解决基质链乘法（MCM）问题。可以按$ o（n^3）$ sptes依次解决此问题，其中$ n $是矩阵的数量，因为其解决方案表的尺寸为$ n \ times n $，并且表的每个元素都可以以$ o（n）$ spest进行计算。为此，典型的加速策略是平行于每个元素的$ o（n）$ step计算，可以通过并行前缀计算来轻松实现，即以锦标赛方式使用$ n $ threads的$ o（\ log n）$ spep计算。通过这样的标准并行方法，我们可以用$ n $ threads解决$ O（n^2 \ log n）$ steps中的MCM问题。在我们的方法中，我们以管道方式解决了GPU上的MCM问题，即，我们使用GPU内核来支撑管道阶段，以便一次平行地平行解决解决方案表的许多元素。我们的实现将以管道方式使用$ n $螺纹的每个计算步骤确定一个输出值，并以$ o（n^2）$ sptep的$ n $ threads构建解决方案表。

In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we explain how to solve a matrix-chain multiplication (MCM) problem by DP on GPU. This problem can be sequentially solved in $O(n^3)$ steps by DP where $n$ is the number of matrices, because its solution table is of size $n \times n$ and each element of the table can be computed in $O(n)$ steps. A typical speedup strategy for this is to parallelize the $O(n)$ step computation of each element, which can be easily achieved by parallel prefix computation, i.e., an $O(\log n)$ step computation with $n$ threads in a tournament fashion. By such a standard parallelizing method, we can solve the MCM problem in $O(n^2 \log n)$ steps with $n$ threads. In our approach, we solve the MCM problem on GPU in a pipeline fashion, i.e., we use GPU cores for supporting pipeline-stages so that many elements of the solution table are partially computed in parallel at one time. Our implementation determines one output value per one computational step with $n$ threads in a pipeline fashion and constructs the solution table totally in $O(n^2)$ steps with $n$ threads.

下载PDF全文

下载文献需遵守相关版权规定

论文标题