通过扩展功能模型对质量和机器学习管道进行建模

论文标题

通过扩展功能模型对质量和机器学习管道进行建模

Modeling Quality and Machine Learning Pipelines through Extended Feature Models

论文作者

d'Aloisio, Giordano, Di Marco, Antinisca, Stilo, Giovanni

论文摘要

最近增加的机器学习复杂性（ML）方法导致了减轻研究和行业发展过程的必要性。 ML管道已成为许多领域，数据科学家和研究人员的专家的重要工具，使他们可以轻松地整理出几种ML模型，以涵盖从RAW数据集开始的完整分析过程。多年来，已经提出了几种解决方案来自动化ML管道的构建，其中大多数集中在输入数据集的语义方面和特征上。但是，考虑到ML系统所需的新质量问题（如公平，解释性，隐私等）仍然缺失的方法。在本文中，我们首先从文献中确定ML系统的关键质量属性。此外，我们通过正确扩展功能模型元模型，为优质ML管道提出了一种新的工程方法。提出的方法允许对ML管道进行建模，其质量要求（在整个管道和单个相位）以及用于实现每个管道阶段的算法的质量特征。最后，我们证明了考虑分类问题的模型的表现力。

The recently increased complexity of Machine Learning (ML) methods, led to the necessity to lighten both the research and industry development processes. ML pipelines have become an essential tool for experts of many domains, data scientists and researchers, allowing them to easily put together several ML models to cover the full analytic process starting from raw datasets. Over the years, several solutions have been proposed to automate the building of ML pipelines, most of them focused on semantic aspects and characteristics of the input dataset. However, an approach taking into account the new quality concerns needed by ML systems (like fairness, interpretability, privacy, etc.) is still missing. In this paper, we first identify, from the literature, key quality attributes of ML systems. Further, we propose a new engineering approach for quality ML pipeline by properly extending the Feature Models meta-model. The presented approach allows to model ML pipelines, their quality requirements (on the whole pipeline and on single phases), and quality characteristics of algorithms used to implement each pipeline phase. Finally, we demonstrate the expressiveness of our model considering the classification problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题