论文标题
调度算法以最大化异质群集中的风暴吞吐量
A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster
论文作者
论文摘要
在最受欢迎的分布式流处理框架(DSPF)中,程序被建模为有向无环图。该模型允许DSPF受益于分布式簇的并行功率。但是,为每个操作员选择适当数量的顶点并在这些顶点和处理资源之间找到适当的映射对整体吞吐量和资源利用有确定的影响;尽管当前DSPF的调度程序的简单性导致这些框架在大规模簇上的性能较差。在本文中,我们介绍了异质性感知的调度算法的设计和实现,该算法找到了应用程序图的正确数量,并将它们映射到最合适的群集节点。我们通过增加拓扑输入率并从瓶颈顶点获取新实例,从而逐渐从给定的群集上扩展应用图。我们在Storm Micro-Benchmark上的实验结果表明,1)预测模型估计CPU使用率为92%。 2)与Storm的默认调度程序相比,我们的调度程序提供7%至44%的吞吐量增强。 3)所提出的方法可以在最佳调度程序的4%(最坏情况)中找到解决方案,该解决方案通过在问题设计空间上进行详尽的搜索获得最佳的调度方案。
In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. This model allows a DSPF to benefit from the parallelism power of distributed clusters. However, choosing the proper number of vertices for each operator and finding an appropriate mapping between these vertices and processing resources have a determinative effect on overall throughput and resource utilization; while the simplicity of current DSPFs' schedulers leads these frameworks to perform poorly on large-scale clusters. In this paper, we present the design and implementation of a heterogeneity-aware scheduling algorithm that finds the proper number of the vertices of an application graph and maps them to the most suitable cluster node. We start to scale up the application graph over a given cluster gradually, by increasing the topology input rate and taking new instances from bottlenecked vertices. Our experimental results on Storm Micro-Benchmark show that 1) the prediction model estimate CPU utilization with 92% accuracy. 2) Compared to default scheduler of Storm, our scheduler provides 7% to 44% throughput enhancement. 3) The proposed method can find the solution within 4% (worst case) of the optimal scheduler which obtains the best scheduling scenario using an exhaustive search on problem design space.