论文标题
广域数据分析
Wide-Area Data Analytics
论文作者
论文摘要
我们越来越居住在数据驱动的世界中,各种各样的数据分布在许多地方。在某些情况下,数据集是从多个位置收集的,例如传感器(例如手机和街头摄像头)遍布整个地理区域。数据可能需要在生产的位置附近进行分析,尤其是当应用程序需要低延迟,高,低成本,用户隐私和监管限制时。在其他情况下,大型数据集分布在公共云,私有云或边缘云计算站点,具有更多的计算,存储,带宽和能源。通常,分析的一部分可能会在端主云或边缘云上进行(以尊重用户隐私并减少数据量),同时依靠远程云来完成分析(以利用更大的计算和存储资源)。 大区域数据分析是对由地理分散实体生成或存储的数据的任何分析。在过去的几年中,计算机科学研究界的几个部分已经开始探索分析数据分布在多个位置的有效方法。特别是,“系统”研究的几个领域 - 包括数据库,分布式系统,计算机网络以及安全性和隐私 - 已深入研究这些主题。这些研究子社区通常集中在问题的不同方面,考虑不同的激励应用和用例,并以不同的方式设计和评估其解决方案。为了应对这些挑战,计算社区联盟(CCC)在2019年10月召集了一项针对广阔数据分析的1.5天研讨会。本报告总结了讨论的挑战以及在研讨会上产生的结论。
We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. In some cases, the datasets are collected from multiple locations, such as sensors (e.g., mobile phones and street cameras) spread throughout a geographic region. The data may need to be analyzed close to where they are produced, particularly when the applications require low latency, high, low cost, user privacy, and regulatory constraints. In other cases, large datasets are distributed across public clouds, private clouds, or edge-cloud computing sites with more plentiful computation, storage, bandwidth, and energy resources. Often, some portion of the analysis may take place on the end-host or edge cloud (to respect user privacy and reduce the volume of data) while relying on remote clouds to complete the analysis (to leverage greater computation and storage resources). Wide-area data analytics is any analysis of data that is generated by, or stored at, geographically dispersed entities. Over the past few years, several parts of the computer science research community have started to explore effective ways to analyze data spread over multiple locations. In particular, several areas of "systems" research - including databases, distributed systems, computer networking, and security and privacy - have delved into these topics. These research subcommunities often focus on different aspects of the problem, consider different motivating applications and use cases, and design and evaluate their solutions differently. To address these challenges the Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019. This report summarizes the challenges discussed and the conclusions generated at the workshop.