Ginex：通过可证明的最佳内存中心培训在单台计算机上启用SSD的十亿个图形神经网络培训

论文标题

Ginex：通过可证明的最佳内存中心培训在单台计算机上启用SSD的十亿个图形神经网络培训

Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching

论文作者

Park, Yeonhong, Min, Sunhong, Lee, Jae W.

论文摘要

最近，图形神经网络（GNN）一直在聚光灯下作为一种强大的工具，可以有效地在图形结构化数据上执行各种推理任务。随着现实图表的大小继续扩展，GNN培训系统面临可扩展性挑战。分布式培训是通过扩展CPU节点来应对这一挑战的一种流行方法。但是，对基于磁盘的GNN培训的关注不多，该培训可以通过利用NVME SSD等高性能存储设备来以更具成本效益的方式扩展单节点系统。我们观察到，主内存和磁盘之间的数据移动是基于SSD的训练系统中的主要瓶颈，并且常规的GNN训练管道是不错的选择，而无需考虑此开销。因此，我们提出了Ginex，这是第一个基于SSD的GNN训练系统，可以在单台计算机上处理数十亿个图形数据集。受到编译器优化的检查员执行模型的启发，Ginex通过分开样品和收集阶段来重组GNN训练管道。这种分离使Ginex能够实现一种可证明的最佳替换算法，即Belady的算法，用于存储器中的Caching特征向量，该算法是I/O访问的主要部分。根据我们对40亿尺度图数据集的评估，Ginex平均比SSD扩展的Pytorch几何得出了2.11倍的训练吞吐量（最大最高2.67倍）。

Recently, Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool that can effectively serve various inference tasks on graph structured data. As the size of real-world graphs continues to scale, the GNN training system faces a scalability challenge. Distributed training is a popular approach to address this challenge by scaling out CPU nodes. However, not much attention has been paid to disk-based GNN training, which can scale up the single-node system in a more cost-effective manner by leveraging high-performance storage devices like NVMe SSDs. We observe that the data movement between the main memory and the disk is the primary bottleneck in the SSD-based training system, and that the conventional GNN training pipeline is sub-optimal without taking this overhead into account. Thus, we propose Ginex, the first SSD-based GNN training system that can process billion-scale graph datasets on a single machine. Inspired by the inspector-executor execution model in compiler optimization, Ginex restructures the GNN training pipeline by separating sample and gather stages. This separation enables Ginex to realize a provably optimal replacement algorithm, known as Belady's algorithm, for caching feature vectors in memory, which account for the dominant portion of I/O accesses. According to our evaluation with four billion-scale graph datasets, Ginex achieves 2.11x higher training throughput on average (up to 2.67x at maximum) than the SSD-extended PyTorch Geometric.

下载PDF全文

下载文献需遵守相关版权规定

论文标题