论文标题
M-Simplex域的可扩展和节能的GPU线图
A Scalable and Energy Efficient GPU Thread Map for m-Simplex Domains
论文作者
论文摘要
这项工作提出了一个针对$ M $ -SIMPLEX域的新的GPU线程图,该图可以随尺寸扩展其加速度,并且与其他最新方法相比,它具有节能效率。这项工作的主要贡献是i)新的块空间映射$ \ MATHCAL {H}:\ MATHBB {Z}^M \ mapsto \ Mapsto \ Mathbb {Z}^m $,用于常规正交单纯域域,以逐步进行了限制的效率,该范围是在资源使用方面进行的,并在限制的范围内进行了效率,该效率是限制的,并在限制的范围内进行了验证,并逐步进行了一个元素的范围。分析的结果表明,$ \ Mathcal {h} $的潜在加速最高为$ 2 \ times $和$ 6 \ times $,分别为$ 2 $和$ 3 $ - 简单。实验评估表明,$ \ Mathcal {h} $的竞争力对于$ 2 $ - 简单,达到$ 1.2 \ times \ sim 2.0 \ sim 2.0 \ times $速度$对于不同的测试,这与最快的艺术方法状态相当。对于$ 3 $ -SIMPLICES $ \ MATHCAL {H} $,$ 1.3 \ times \ sim \ sim 6.0 \ times $速度的加速$,使其成为最快的。 $ \ MATHCAL {H} $向更高维$ m $ -simplices的扩展是可行的,并且具有适当选择的参数$ r,β$的潜在速度为$ m!$,分别是缩放和复制因子。在能源消耗方面,尽管$ \ Mathcal {H} $是功耗最高的之一,但它的持续时间很短,使其成为最节能的方法之一。最后,分析了张量和射线追踪核心的进一步改进,从而提供了见解,以利用每个核心。在这项工作中获得的结果表明,$ \ Mathcal {H} $是可扩展且节能的地图,当他们需要处理$ M $ $ $ simplex域(例如Cellular Automata或PDE模拟)时,它们可以促进GPU应用程序的效率。
This work proposes a new GPU thread map for $m$-simplex domains, that scales its speedup with dimension and is energy efficient compared to other state of the art approaches. The main contributions of this work are i) the formulation of the new block-space map $\mathcal{H}: \mathbb{Z}^m \mapsto \mathbb{Z}^m$ for regular orthogonal simplex domains, which is analyzed in terms of resource usage, and ii) the experimental evaluation in terms of speedup over a bounding box approach and energy efficiency as elements per second per Watt. Results from the analysis show that $\mathcal{H}$ has a potential speedup of up to $2\times$ and $6\times$ for $2$ and $3$-simplices, respectively. Experimental evaluation shows that $\mathcal{H}$ is competitive for $2$-simplices, reaching $1.2\times \sim 2.0\times$ of speedup for different tests, which is on par with the fastest state of the art approaches. For $3$-simplices $\mathcal{H}$ reaches up to $1.3\times \sim 6.0\times$ of speedup making it the fastest of all. The extension of $\mathcal{H}$ to higher dimensional $m$-simplices is feasible and has a potential speedup that scales as $m!$ given a proper selection of parameters $r, β$ which are the scaling and replication factors, respectively. In terms of energy consumption, although $\mathcal{H}$ is among the highest in power consumption, it compensates by its short duration, making it one of the most energy efficient approaches. Lastly, further improvements with Tensor and Ray Tracing Cores are analyzed, giving insights to leverage each one of them. The results obtained in this work show that $\mathcal{H}$ is a scalable and energy efficient map that can contribute to the efficiency of GPU applications when they need to process $m$-simplex domains, such as Cellular Automata or PDE simulations.