论文标题

Tuneful:大数据分析的在线意义配置调谐器

Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

论文作者

Fekry, Ayat, Carata, Lucian, Pasquier, Thomas, Rice, Andrew, Hopper, Andy

论文摘要

分布式分析引擎(例如SPARK)是处理极大数据集的常见选择。但是,为这些系统找到良好的配置仍然具有挑战性,每个工作负载都可能要求使用不同的设置才能最佳运行。使用次优配置会带来巨大的额外运行时成本。此外,在数据科学家社区中,对此类问题的认识相对较低的社区中,Spark和类似平台正在获得吸引力。 我们提出了调整的方法,该方法有效地调节了内存群集计算系统的配置。调整结合了增量灵敏度分析和贝叶斯优化,以使用少量执行来识别高维搜索空间中近乎最佳的配置。此设置允许在没有任何以前的培训的情况下在线进行调整。我们的实验结果表明,与现有的最新技术相比,调整的搜索时间减少了近距离配置的搜索时间(中位数)。这意味着调整成本的摊销发生速度要快得多,从而为新的工作负载提供了实用的调整。

Distributed analytics engines such as Spark are a common choice for processing extremely large datasets. However, finding good configurations for these systems remains challenging, with each workload potentially requiring a different setup to run optimally. Using suboptimal configurations incurs significant extra runtime costs. %Furthermore, Spark and similar platforms are gaining traction within data-scientists communities where awareness of such issues is relatively low. We propose Tuneful, an approach that efficiently tunes the configuration of in-memory cluster computing systems. Tuneful combines incremental Sensitivity Analysis and Bayesian optimization to identify near-optimal configurations from a high-dimensional search space, using a small number of executions. This setup allows the tuning to be done online, without any previous training. Our experimental results show that Tuneful reduces the search time for finding close-to-optimal configurations by 62\% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning for new classes of workloads.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源