论文标题

用于分布式数据并行处理的协作集群配置:研究概述

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

论文作者

Thamsen, Lauritz, Scheinert, Dominik, Will, Jonathan, Bader, Jonathan, Kao, Odej

论文摘要

许多组织通常使用系统进行分布式数据并行处理和商品资源集群分析大型数据集。但是,用户需要为其数据处理作业配置足够的资源。这需要对预期的工作时间和扩展行为,资源特征,输入数据分布和其他因素进行重大见解。无法准确估算绩效,用户经常为其工作提供过多的资源,从而导致资源利用率较低和高昂的成本。在本文中,我们为基于运行时数据和性能模型优化数据处理群集配置的协作方法提供了主要的构件。我们认为,可以在不同执行环境中共享和用于性能模型的运行时数据,从而大大降低了对单个处理作业或专用的工作分析的复发的依赖。为此,我们描述了如何使用处理作业和集群基础架构的相似性将来自本地和全球工作执行的合适数据点结合到准确的绩效模型中。此外,我们通过更多的上下文感知和可重复使用的模型概述了绩效预测的方法。最后,我们列出了如何将来自先前执行的指标与运行时监视结合起来,以动态重新配置模型和群集。

Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires significant insights into expected job runtimes and scaling behavior, resource characteristics, input data distributions, and other factors. Unable to estimate performance accurately, users frequently overprovision resources for their jobs, leading to low resource utilization and high costs. In this paper, we present major building blocks towards a collaborative approach for optimization of data processing cluster configurations based on runtime data and performance models. We believe that runtime data can be shared and used for performance models across different execution contexts, significantly reducing the reliance on the recurrence of individual processing jobs or, else, dedicated job profiling. For this, we describe how the similarity of processing jobs and cluster infrastructures can be employed to combine suitable data points from local and global job executions into accurate performance models. Furthermore, we outline approaches to performance prediction via more context-aware and reusable models. Finally, we lay out how metrics from previous executions can be combined with runtime monitoring to effectively re-configure models and clusters dynamically.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源