论文标题

ML-AQP:基于机器学习的查询驱动的近似查询处理

ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning

论文作者

Savva, Fotis, Anagnostopoulos, Christos, Triantafillou, Peter

论文摘要

随着越来越多的组织依赖于数据驱动的决策,大规模分析变得越来越重要。但是,分析师经常被困在等待确切的结果。因此,组织转向具有基础架构的云提供商,以有效地分析大量数据。但是,随着成本的增加,组织必须优化其使用情况。提供廉价的替代方案可以提供速度和效率。具体而言,我们提供了一种解决方案,可以依靠机器学习(ML)为汇总查询提供近似答案,该问题能够与云系统一起工作。我们开发的轻型ML领导的系统可以存储在分析师的本地机器上,也可以作为服务来立即回答分析查询,响应时间较低,货币/计算成本和能量足迹的服务。为了实现这一目标,我们利用先前回答的查询获得的知识并构建ML模型,这些模型可以以有效且廉价的方式估算新查询的结果。使用真实和合成数据集/工作负载和众所周知的基准,使用广泛的评估来证明我们系统的功能。

As more and more organizations rely on data-driven decision making, large-scale analytics become increasingly important. However, an analyst is often stuck waiting for an exact result. As such, organizations turn to Cloud providers that have infrastructure for efficiently analyzing large quantities of data. But, with increasing costs, organizations have to optimize their usage. Having a cheap alternative that provides speed and efficiency will go a long way. Concretely, we offer a solution that can provide approximate answers to aggregate queries, relying on Machine Learning (ML), which is able to work alongside Cloud systems. Our developed lightweight ML-led system can be stored on an analyst's local machine or deployed as a service to instantly answer analytic queries, having low response times and monetary/computational costs and energy footprint. To accomplish this we leverage the knowledge obtained by previously answered queries and build ML models that can estimate the result of new queries in an efficient and inexpensive manner. The capabilities of our system are demonstrated using extensive evaluation with both real and synthetic datasets/workloads and well known benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源