论文标题
DBM中基于机器学习的基础性估算预处理数据
Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data
论文作者
论文摘要
基数估计是数据库查询处理和优化的基本任务。如最近的论文所示,基于机器学习(ML)的方法比传统方法可以提供更准确的基数估计。但是,在模型训练阶段必须执行许多示例查询,以学习与数据相关的ML模型,从而导致非常耗时的训练阶段。这些示例的许多查询使用相同的基本数据,具有相同的查询结构,并且其谓词只有不同。因此,索引结构乍一看似乎是一种理想的优化技术。但是,他们的收益是有限的。为了加快这个模型训练阶段,我们的核心思想是确定基本数据的独立于谓词的预凝聚集,并在此预处理数据上执行示例查询。基于这个想法,我们为本文提供了一个基于ML的基数估算方法的特定培训培训阶段。当我们在评估中以不同的工作量显示时,我们能够在基于培训的培训阶段达到平均速度为63。
Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches can deliver more accurate cardinality estimations than traditional approaches. However, a lot of example queries have to be executed during the model training phase to learn a data-dependent ML model leading to a very time-consuming training phase. Many of those example queries use the same base data, have the same query structure, and only differ in their predicates. Thus, index structures appear to be an ideal optimization technique at first glance. However, their benefit is limited. To speed up this model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-enabled training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 63 with our aggregate-enabled training phase.