论文标题
商品硬件的GPU驱动的空间数据库引擎:扩展版本
GPU-Powered Spatial Database Engine for Commodity Hardware: Extended Version
论文作者
论文摘要
鉴于空间数据量的大量增长,因此非常需要可以有效评估大型数据集空间查询的系统。使用传统的数据库解决方案,这些查询非常昂贵。虽然可以通过具有大型主机的强大群集或服务器来达到更快的响应时间,但由于成本和复杂性,这些选项对许多数据科学家和分析师来说是遥不可及的。 图形处理单元(GPU)现在即使在商品台式机和笔记本电脑中也可以广泛使用,它为支持高性能计算提供了一种具有成本效益的替代方法,为有效评估空间查询提供了新的机会。尽管文献中提出的基于GPU的方法在性能方面有很大改善,但它们与特定的GPU硬件相关,并且仅处理固定几何类型的特定查询。 在本文中,我们介绍了Spade,这是一个支持GPU的空间数据库引擎,该引擎支持丰富的空间查询。我们讨论在大型数据集中获得有效的查询评估以及不同GPU硬件的可移植性所涉及的挑战,以及如何在Spade中解决这些问题。我们进行了详细的实验评估,以评估系统对各种查询和数据集的有效性,并报告结果表明,Spade是可扩展的,能够处理大于主机的数据,并且其在笔记本电脑上的性能与其他需要群集或大型内膜服务器的系统相当。
Given the massive growth in the volume of spatial data, there is a great need for systems that can efficiently evaluate spatial queries over large data sets. These queries are notoriously expensive using traditional database solutions. While faster response times can be attained through powerful clusters or servers with large main-memory, these options, due to cost and complexity, are out of reach to many data scientists and analysts making up the long tail. Graphics Processing Units (GPUs), which are now widely available even in commodity desktops and laptops, provide a cost-effective alternative to support high-performance computing, opening up new opportunities to the efficient evaluation of spatial queries. While GPU-based approaches proposed in the literature have shown great improvements in performance, they are tied to specific GPU hardware and only handle specific queries over fixed geometry types. In this paper we present SPADE, a GPU-powered spatial database engine that supports a rich set of spatial queries. We discuss the challenges involved in attaining efficient query evaluation over large datasets as well as portability across different GPU hardware, and how these are addressed in SPADE. We performed a detailed experimental evaluation to assess the effectiveness of the system for wide range of queries and datasets, and report results which show that SPADE is scalable and able to handle data larger than main-memory, and its performance on a laptop is on par with that other systems that require clusters or large-memory servers.