论文标题
75,000,000,000流量插入/秒,使用分层Hypersparse Graphblas矩阵
75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices
论文作者
论文摘要
Suitesparse Graphblas C-Library实现了高性能的Hypersparse矩阵,具有与各种语言(Python,Julia和Matlab/octave)结合的。 Graphblas提供了轻巧的内存数据库实现Hypersparse矩阵,非常适合分析多种类型的网络数据,同时提供严格的数学保证,例如线性性。 Hypersparse矩阵的流式更新对内存层次结构造成了巨大压力。这项工作基准了层次高质体矩阵的实现,该矩阵可降低记忆压力,并大大提高更新速率为Hypersparse矩阵。层次级别的Hyperparse矩阵的参数依赖于在级联更新之前控制层次结构中每个级别中的条目数量。这些参数易于调整,以实现各种应用程序的最佳性能。在单个实例中,分层超出矩阵每秒可实现超过1,000,000个更新。在MIT SuperCloud上,在1,100个服务器节点上扩展到31,000个实例的Hypersparse矩阵阵列,每秒实现了75,000,000个更新的持续更新率。此功能使MIT SuperCloud可以分析非常大的流网络数据集。
The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.