论文标题
高性能I/O用于大规模深度学习
High Performance I/O For Large Scale Deep Learning
论文作者
论文摘要
在Petascale数据集上的培训深度学习模型(DL)模型对于在语音,视频分析和对象识别等应用中实现竞争性和最先进的性能至关重要。但是,并未针对DL作业的访问模式和可用性要求开发现有的分布式文件系统。在本文中,我们描述了Aistore,一种高度可扩展,易于隐藏的存储系统和WebDataSet,这是一种基于标准的存储格式和库,可有效访问非常大的数据集。我们使用图像分类工作负载并在各种后端存储训练数据,包括本地SSD,单节点NFS和两个相同的裸金属簇:HDFS和AISTORE,对系统性能进行了比较。
Training deep learning (DL) models on petascale datasets is essential for achieving competitive and state-of-the-art performance in applications such as speech, video analytics, and object recognition. However, existing distributed filesystems were not developed for the access patterns and usability requirements of DL jobs. In this paper, we describe AIStore, a highly scalable, easy-to-deploy storage system, and WebDataset, a standards-based storage format and library that permits efficient access to very large datasets. We compare system performance experimentally using image classification workloads and storing training data on a variety of backends, including local SSDs, single-node NFS, and two identical bare-metal clusters: HDFS and AIStore.