论文标题
整个GPU会计的不合时宜
The anachronism of whole-GPU accounting
论文作者
论文摘要
NVIDIA在增加其GPU的计算性能方面一直在稳步进步,这是多年来计算吞吐量改进的顺序。由于多种部署的GPU模型共存,将所有GPU视为平等的传统会计方法不再反映计算输出。此外,对于需要大量基于CPU的计算来补充基于GPU的计算的应用程序,充分利用较新的GPU,需要在多个应用程序之间共享这些GPU,以最大程度地提高可实现的科学输出,越来越困难。这进一步降低了整个GPU会计的价值,尤其是在基础架构级别进行共享时。因此,我们认为,GPU考虑到面向吞吐量的基础架构应在GPU核心小时中表达,就像通常为CPU进行的一样。尽管GPU Core Compute吞吐量确实在GPU几代之间发生了变化,但可变性与我们期望在CPU内核中看到的变异性相似。为了验证我们的位置,我们使用本地资源和云资源都提供了14个GPU模型上两个Icecube光子传播工作流程的一组运行时间测量值。测量还概述了HTCONDOR和KUBERNETES基础设施水平上GPU共享的影响。
NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover, for applications that require significant CPU-based compute to complement the GPU-based compute, it is becoming harder and harder to make full use of the newer GPUs, requiring sharing of those GPUs between multiple applications in order to maximize the achievable science output. This further reduces the value of whole-GPU accounting, especially when the sharing is done at the infrastructure level. We thus argue that GPU accounting for throughput-oriented infrastructures should be expressed in GPU core hours, much like it is normally done for the CPUs. While GPU core compute throughput does change between GPU generations, the variability is similar to what we expect to see among CPU cores. To validate our position, we present an extensive set of run time measurements of two IceCube photon propagation workflows on 14 GPU models, using both on-prem and Cloud resources. The measurements also outline the influence of GPU sharing at both HTCondor and Kubernetes infrastructure level.