KAFKA-ML：将数据流与ML/AI框架连接

论文标题

KAFKA-ML：将数据流与ML/AI框架连接

Kafka-ML: connecting the data stream with ML/AI frameworks

论文作者

Martín, Cristian, Langendoerfer, Peter, Zarrin, Pouya Soltani, Díaz, Manuel, Rubio, Bartolomé

论文摘要

机器学习（ML）和人工智能（AI）对通过其算法进行训练，改进和做出预测的数据源具有依赖性。随着数字革命和当前物联网范式的范围，此信息将从静态数据转变为连续数据流。但是，如今使用的大多数ML/AI框架并未为这场革命做好充分的准备。在本文中，我们提出了一个开源框架Kafka-ML，该框架可以通过数据流（Apache Kafka）对Tensorflow ML/AI管道进行管理。 KAFKA-ML提供了一种可访问且用户友好的Web用户界面，用户可以轻松地定义ML模型，然后训练，评估和部署它们进行推理。 Kafka-ML本身及其部署的组件通过容器化技术进行了充分的管理，这些技术可确保其便携性和易于分配以及其他功能，例如容错和高可用性。最后，已经引入了一种新颖的方法来管理和重复使用数据流，这可能会导致数据存储和文件系统的（否）利用。

Machine Learning (ML) and Artificial Intelligence (AI) have a dependency on data sources to train, improve and make predictions through their algorithms. With the digital revolution and current paradigms like the Internet of Things, this information is turning from static data into continuous data streams. However, most of the ML/AI frameworks used nowadays are not fully prepared for this revolution. In this paper, we proposed Kafka-ML, an open-source framework that enables the management of TensorFlow ML/AI pipelines through data streams (Apache Kafka). Kafka-ML provides an accessible and user-friendly Web User Interface where users can easily define ML models, to then train, evaluate and deploy them for inference. Kafka-ML itself and its deployed components are fully managed through containerization technologies, which ensure its portability and easy distribution and other features such as fault-tolerance and high availability. Finally, a novel approach has been introduced to manage and reuse data streams, which may lead to the (no) utilization of data storage and file systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题