论文标题

快速的柱状物理分析Terabyte-Scale-cale LHC数据上的cache-感知dask群集

Fast Columnar Physics Analyses of Terabyte-Scale LHC Data on a Cache-Aware Dask Cluster

论文作者

Eich, Niclas, Erdmann, Martin, Fackeldey, Peter, Fischer, Benjamin, Noll, Dennis, Rath, Yannik

论文摘要

LHC物理分析的开发涉及大量研究,这些研究需要重复处理数据的数据。因此,这些分析周期的快速完成对于掌握科学项目至关重要。我们提出了一种解决小型研究所集群的物理分析和加速物理分析的解决方案。我们的解决方案基于三个关键概念:碰撞事件的矢量化处理,用于计算簇上扩展的“ MapReduce”范式,并有效利用SSD缓存以减少IO操作中的潜伏期。以Higgs对生产物理分析的模拟为例,我们在一个周期后的运行时获得了6.3美元的改善系数,甚至在10美元的周期后,总的加速度甚至是14.9美元的总速度。

The development of an LHC physics analysis involves numerous investigations that require the repeated processing of terabytes of data. Thus, a rapid completion of each of these analysis cycles is central to mastering the science project. We present a solution to efficiently handle and accelerate physics analyses on small-size institute clusters. Our solution is based on three key concepts: Vectorized processing of collision events, the "MapReduce" paradigm for scaling out on computing clusters, and efficiently utilized SSD caching to reduce latencies in IO operations. Using simulations from a Higgs pair production physics analysis as an example, we achieve an improvement factor of $6.3$ in runtime after one cycle and even an overall speedup of a factor of $14.9$ after $10$ cycles.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源