论文标题

机器学习生命周期工件的管理:一项调查

Management of Machine Learning Lifecycle Artifacts: A Survey

论文作者

Schlegel, Marius, Sattler, Kai-Uwe

论文摘要

开发和操作机器学习(ML)应用程序的探索性和迭代性质导致各种工件,例如数据集,功能,模型,超参数,指标,软件,配置和日志。为了启用在ML生命周期步骤和迭代中,已经开发出这些工件的可比性,可重复性和可追溯性,以支持其收集,存储和管理。这种系统提供了什么精确的功能范围,因此候选人之间的协同作用的比较和估计非常具有挑战性。在本文中,我们旨在概述支持ML生命周期伪像管理的系统和平台。根据系统的文献综述,我们得出了评估标准,并将其应用于60多个系统和平台的代表性选择。

The explorative and iterative nature of developing and operating machine learning (ML) applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and platforms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源