论文标题

展示用于科学计算的前外部,具有成本效益的多云环境

Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing

论文作者

Sfiligoi, I., Schultz, D., Riedel, B., Wuerthwein, F., Barnet, S., Brik, V.

论文摘要

随着时间的流逝,科学计算的需求正在急剧增长,并且正在扩大以前未计算密集的科学领域。当计算工作流程超过其本地计算资源的能力时,应从其他地方暂时提供能力,以既达到截止日期并增加科学产量。由于公共云的能力最少,因此公共云已成为一个有吸引力的选择。具有成本效益的实例的可用能力尚不清楚。本文介绍了从三个主要的云提供商(即亚马逊Web Services,Microsoft Azure和Google Cloud Cloud Platform)收集的可预占模式中具有成本效益的GPU实例来扩展ICECUBE的HTCONDOR池。使用此设置,我们在整个工作日中维持了大约15,000 gpus,相当于170个Pflop32s,将一个超过一个Eflop32小时的科学输出整合起来,价格约为6万美元。在本文中,我们提供了云实例选择背后的推理,对设置的描述以及对准备资源的分析以及对练习的实际科学输出的简短描述。

Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源