AI耦合的HPC工作流程

论文标题

AI耦合的HPC工作流程

AI-coupled HPC Workflows

论文作者

Jha, Shantenu, Pascuzzi, Vincent R., Turilli, Matteo

论文摘要

越来越多的科学发现需要复杂且可扩展的工作流程。工作流程已成为``新应用程序''，其中多尺度计算活动包括多个和异质的可执行任务。特别是，将AI/ML模型引入传统的HPC工作流程已成为高度准确建模的推动力，与传统方法相比，通常会减少计算需求。本章将讨论将AI/ML模型集成到HPC计算的各种模式，从而导致了不同类型的AI耦合HPC工作流程。激励了跨科学领域的AI/ML和HPC耦合的需求越来越多，然后以每种模式的许多生产级用例来体现。我们还讨论了极端尺度AI耦合的HPC广告系列的主要挑战 - 任务异质性，适应性，性能 - 以及旨在解决这些问题的几种框架和中间件解决方案。尽管HPC工作流程和AI/ML计算范式都是独立有效的，但我们强调了它们的整合和最终收敛如何导致一系列领域的科学性能的显着改善，最终导致了科学探索，否则就无法实现。

Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows have become the ``new applications,'' wherein multi-scale computing campaigns comprise multiple and heterogeneous executable tasks. In particular, the introduction of AI/ML models into the traditional HPC workflows has been an enabler of highly accurate modeling, typically reducing computational needs compared to traditional methods. This chapter discusses various modes of integrating AI/ML models to HPC computations, resulting in diverse types of AI-coupled HPC workflows. The increasing need of coupling AI/ML and HPC across scientific domains is motivated, and then exemplified by a number of production-grade use cases for each mode. We additionally discuss the primary challenges of extreme-scale AI-coupled HPC campaigns -- task heterogeneity, adaptivity, performance -- and several framework and middleware solutions which aim to address them. While both HPC workflow and AI/ML computing paradigms are independently effective, we highlight how their integration, and ultimate convergence, is leading to significant improvements in scientific performance across a range of domains, ultimately resulting in scientific explorations otherwise unattainable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题