论文标题

根树I/O的演变

Evolution of the ROOT Tree I/O

论文作者

Blomer, Jakob, Canal, Philippe, Naumann, Axel, Piparo, Danilo

论文摘要

根TTREE数据格式编码数百个高能量和核物理事件的物体。它的柱状布局可驱动快速分析,因为只有在给定分析中真正使用的那些部分(“分支”)需要从存储中读取。它的独特功能是无缝的C ++集成,该集成使用户可以直接存储其事件类而无需明确定义数据模式。在此贡献中,我们介绍了未来的根7事件I/O的状态和计划。与ROOT 7接口现代化一起,我们旨在在可能的编译时安全的C ++界面来读取事件数据。在性能方面,我们使用Root的新实验I/O子系统展示了第一基准测试,该系统将最佳的TTrees与柱状数据格式的最新进展相结合。核心成分是从低级物理数据布局(简单类型的存储后置嵌套向量)的高级逻辑数据布局(C ++类)的强烈分离。我们展示了新的,优化的物理数据布局如何加快序列化和避免序列化,并促进并行,矢量化和批量操作。这使得root i/o在即将到来的超快速NVRAM存储设备以及无文件存储系统(例如对象存储)上最佳运行。

The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts ("branches") that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOT's new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源