论文标题
通过神经网络检测大数据处理基础架构中的Straggler MapReduce任务
Detecting Straggler MapReduce Tasks in Big Data Processing Infrastructure by Neural Network
论文作者
论文摘要
Straggler任务检测是应用MapReduce并行化和分发大规模数据处理的主要挑战之一。它被定义为检测弱节点上的运行任务。考虑地图相副本,组合和三个阶段的两个阶段,减少洗牌,排序和减少,总执行时间是这五个阶段的执行时间的总和。在每个阶段估计正确执行时间的正确执行时间是本文的主要目的。所提出的方法基于在Hadoop上应用反向传播神经网络NN检测Straggler任务,以估计任务的剩余执行时间,这对于Straggler任务检测非常重要。将获得的结果与该域中的流行算法进行了比较,例如Late,ESAMR和WordCount和排序基准的实际剩余时间,并显示能够检测Straggler任务并准确估算执行时间。此外,它支持加速任务执行时间。
Straggler task detection is one of the main challenges in applying MapReduce for parallelizing and distributing large-scale data processing. It is defined as detecting running tasks on weak nodes. Considering two stages in the Map phase copy, combine and three stages of Reduce shuffle, sort and reduce, the total execution time is the total sum of the execution time of these five stages. Estimating the correct execution time in each stage that results in correct total execution time is the primary purpose of this paper. The proposed method is based on the application of a backpropagation Neural Network NN on the Hadoop for the detection of straggler tasks, to estimate the remaining execution time of tasks that is very important in straggler task detection. Results achieved have been compared with popular algorithms in this domain such as LATE, ESAMR and the real remaining time for WordCount and Sort benchmarks, and shown able to detect straggler tasks and estimate execution time accurately. Besides, it supports to accelerate task execution time.