autoweka4mcps-avatar：加速自动化机器学习管道组成和优化

论文标题

autoweka4mcps-avatar：加速自动化机器学习管道组成和优化

AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline Composition and Optimisation

论文作者

Nguyen, Tien-Dung, Gabrys, Bogdan, Musial, Katarzyna

论文摘要

自动化的机器学习管道（ML）组成和优化旨在自动化在分配资源（即时间，CPU和内存）中找到最有希望的ML管道的过程。现有方法，例如基于贝叶斯的基于遗传的优化，这些方法是在自动Weka，Auto-Sklearn和TPOT中实施的，通过执行它们来评估管道。因此，这些方法的管道组成和优化通常需要大量的时间，以防止他们探索复杂的管道以找到更好的预测模型。为了进一步探讨这项研究挑战，我们进行了实验，表明许多生成的管道首先是无效的，而试图执行它们是浪费时间和资源。为了解决这个问题，我们提出了一种新的方法，可以使用替代模型（Avatar）评估ML管道的有效性而无需执行。阿凡达（Avatar）通过自动学习ML算法对数据集特性的功能和影响来产生知识库。该知识库用于从原始ML管道到替代模型的简化映射，该模型是基于Petri的管道。阿凡达没有执行原始的ML管道来评估其有效性，而是评估其替代模型，该模型由ML管道组件的功能和效果构建，输入/输出/输出简化映射。评估此替代模型的资源密集程度不如原始管道的执行。结果，头像使管道组成和优化方法能够快速拒绝无效管道来评估更多管道。我们将化身集成到基于顺序模型的算法配置（SMAC）中。我们的实验表明，当SMAC使用阿凡达（Avatar）时，它发现的解决方案比自己的解决方案更好。

Automated machine learning pipeline (ML) composition and optimisation aim at automating the process of finding the most promising ML pipelines within allocated resources (i.e., time, CPU and memory). Existing methods, such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods frequently require a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid in the first place, and attempting to execute them is a waste of time and resources. To address this issue, we propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR). The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets' characteristics. This knowledge base is used for a simplified mapping from an original ML pipeline to a surrogate model which is a Petri net based pipeline. Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components and input/output simplified mappings. Evaluating this surrogate model is less resource-intensive than the execution of the original pipeline. As a result, the AVATAR enables the pipeline composition and optimisation methods to evaluate more pipelines by quickly rejecting invalid pipelines. We integrate the AVATAR into the sequential model-based algorithm configuration (SMAC). Our experiments show that when SMAC employs AVATAR, it finds better solutions than on its own.

下载PDF全文

下载文献需遵守相关版权规定

论文标题