通过离散事件模拟预测异质AI系统的性能

论文标题

通过离散事件模拟预测异质AI系统的性能

Predicting Performance of Heterogeneous AI Systems with Discrete-Event Simulations

论文作者

Zhdanovskiy, Vyacheslav, Teplyakov, Lev, Grigoryev, Anton

论文摘要

近年来，人工智能（AI）技术在各个领域都发现了工业应用。 AI系统通常具有复杂的软件和异质性CPU/GPU硬件体系结构，因此考虑性能评估和软件优化，很难回答基本问题。瓶颈在哪里阻碍了系统？工作负载如何尺度表现？特定模块的加速如何有助于整个系统？通过对实际系统的实验找到这些问题的答案可能需要大量的计算，人类，财务和时间资源。削减这些成本的解决方案是使用准备实现实际系统中任何内容的快速准确模拟模型。在本文中，我们在视频分析的背景下提出了高负载异质AI系统的离散事件模拟模型。使用提出的模型，我们估计：1）摄像机数量增加的性能可伸缩性； 2）整合新模块的性能影响； 3）优化单个模块的性能增益。我们表明，所提出的模型的性能估计精度高于90％。我们还证明，所考虑的系统在工作量和绩效之间具有违反直觉的关系，但是，这是由提出的仿真模型正确地推断出来的。

In recent years, artificial intelligence (AI) technologies have found industrial applications in various fields. AI systems typically possess complex software and heterogeneous CPU/GPU hardware architecture, making it difficult to answer basic questions considering performance evaluation and software optimization. Where is the bottleneck impeding the system? How does the performance scale with the workload? How the speed-up of a specific module would contribute to the whole system? Finding the answers to these questions through experiments on the real system could require a lot of computational, human, financial, and time resources. A solution to cut these costs is to use a fast and accurate simulation model preparatory to implementing anything in the real system. In this paper, we propose a discrete-event simulation model of a high-load heterogeneous AI system in the context of video analytics. Using the proposed model, we estimate: 1) the performance scalability with the increasing number of cameras; 2) the performance impact of integrating a new module; 3) the performance gain from optimizing a single module. We show that the performance estimation accuracy of the proposed model is higher than 90%. We also demonstrate, that the considered system possesses a counter-intuitive relationship between workload and performance, which nevertheless is correctly inferred by the proposed simulation model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题