论文标题

使用未使用的:与HPC-Whisk的非侵入性动态FAA基础架构

Using Unused: Non-Invasive Dynamic FaaS Infrastructure with HPC-Whisk

论文作者

Przybylski, Bartłomiej, Pawlik, Maciej, Żuk, Paweł, Łagosz, Bartłomiej, Malawski, Maciej, Rzadca, Krzysztof

论文摘要

现代的HPC工作负载经理及其仔细的调整有助于HPC群集的高利用。但是,由于不可避免的不确定性,不可能完全避免节点闲置。尽管对于任何HPC工作而言,这种闲置插槽通常太短了,但它们太长了,无法忽略它们。函数-AS-A-Service(FAAS)范式有希望填补这一空白,并且可以是一个很好的匹配,因为典型的FAAS功能持续几秒钟,而不是小时。在这里,我们展示了如何在HPC群集中在闲置节点上建立FAAS基础架构,以至于它不会显着影响HPC工作的绩效。我们通过集成开源软件slurm和OpenWhisk,动态地适应了一套不断变化的空闲物理机器。 我们设计并实施了一种原型解决方案,该解决方案使我们能够在运行生产工作负载的50k核心群集上覆盖多达90%的空闲时间插槽。

Modern HPC workload managers and their careful tuning contribute to the high utilization of HPC clusters. However, due to inevitable uncertainty it is impossible to completely avoid node idleness. Although such idle slots are usually too short for any HPC job, they are too long to ignore them. Function-as-a-Service (FaaS) paradigm promisingly fills this gap, and can be a good match, as typical FaaS functions last seconds, not hours. Here we show how to build a FaaS infrastructure on idle nodes in an HPC cluster in such a way that it does not affect the performance of the HPC jobs significantly. We dynamically adapt to a changing set of idle physical machines, by integrating open-source software Slurm and OpenWhisk. We designed and implemented a prototype solution that allowed us to cover up to 90\% of the idle time slots on a 50k-core cluster that runs production workloads.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源