论文标题

SOL:轻松的设备支持AI框架没有源代码更改

SOL: Effortless Device Support for AI Frameworks without Source Code Changes

论文作者

Weber, Nicolas, Huici, Felipe

论文摘要

现代的高性能计算群集在很大程度上依赖加速器来克服CPU的计算功率有限。这些超级计算机从不同领域(例如模拟,数值应用或人工智能(AI))运行各种应用程序。结果,供应商需要能够在其硬件上有效地运行各种工作量。在AI域中,这尤其会因许多流行框架(例如,Pytorch,TensorFlow等)的存在而加剧,这些框架没有常见的代码基础,并且功能上可能会有所不同。这些框架的代码迅速发展,使得跟上所有更改并有可能迫使开发人员不断进行上游的回合变得昂贵。在本文中,我们探讨了如何在AI框架中提供硬件支持,而无需更改框架的源代码以最大程度地减少维护开销。我们介绍了SOL,这是一种AI加速中间件,它提供了一个硬件抽象层,使我们能够透明地支持异质硬件。作为概念证明,我们为Pytorch实施了三个后端:CPU,GPU和向量处理器。

Modern high performance computing clusters heavily rely on accelerators to overcome the limited compute power of CPUs. These supercomputers run various applications from different domains such as simulations, numerical applications or artificial intelligence (AI). As a result, vendors need to be able to efficiently run a wide variety of workloads on their hardware. In the AI domain this is in particular exacerbated by the existence of a number of popular frameworks (e.g, PyTorch, TensorFlow, etc.) that have no common code base, and can vary in functionality. The code of these frameworks evolves quickly, making it expensive to keep up with all changes and potentially forcing developers to go through constant rounds of upstreaming. In this paper we explore how to provide hardware support in AI frameworks without changing the framework's source code in order to minimize maintenance overhead. We introduce SOL, an AI acceleration middleware that provides a hardware abstraction layer that allows us to transparently support heterogeneous hardware. As a proof of concept, we implemented SOL for PyTorch with three backends: CPUs, GPUs and vector processors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源