论文标题
猛禽区域统计:大栅格 +矢量数据的完全分布的区域统计[Pre-Print]
Raptor Zonal Statistics: Fully Distributed Zonal Statistics of Big Raster + Vector Data [Pre-Print]
论文作者
论文摘要
遥感技术的最新进展导致了栅格格式的数据。这些数据通常与代表城市边界的高分辨率向量数据结合使用。结合大栅格和向量数据的常见操作之一是Zonal统计数据,该统计数据计算矢量数据集中每个多边形的一些统计信息。本文将区域统计问题建模为联合问题,并提出了一个新型的分布式系统,该系统可以扩展到栅格和矢量数据的之前。提出的方法不需要任何预处理或索引,这使其非常适合科学家通常想要运行的临时查询。我们设计了一个理论成本模型,该模型证明了我们算法在基线方法上的效率。此外,我们对具有最新像素的大规模卫星数据进行了广泛的实验评估,并且具有最高边缘的大量矢量数据,我们表明我们的方法可以完美地扩展到与Rasdaman和Google Earth Engine相比,具有最高级级的性能增长的大数据。
Recent advancements in remote sensing technology have resulted in petabytes of data in raster format. This data is often processed in combination with high resolution vector data that represents, for example, city boundaries. One of the common operations that combine big raster and vector data is the zonal statistics which computes some statistics for each polygon in the vector dataset. This paper models the zonal statistics problem as a join problem and proposes a novel distributed system that can scale to petabytes of raster and vector data. The proposed method does not require any preprocessing or indexing which makes it perfect for ad-hoc queries that scientists usually want to run. We devise a theoretical cost model that proves the efficiency of our algorithm over the baseline method. Furthermore, we run an extensive experimental evaluation on large scale satellite data with up-to a trillion pixels, and big vector data with up-to hundreds of millions of edges, and we show that our method can perfectly scale to big data with up-to two orders of magnitude performance gain over Rasdaman and Google Earth Engine.