论文标题

轻量级自动化功能监视数据流

Lightweight Automated Feature Monitoring for Data Streams

论文作者

Conde, João, Moreira, Ricardo, Torres, João, Cardoso, Pedro, Ferreira, Hugo R. C., Sampaio, Marco O. P., Ascensão, João Tiago, Bizarro, Pedro

论文摘要

监视自动实时流处理系统的行为已成为现实世界应用中最相关的问题之一。这种系统在很大程度上依赖于高维输入数据和数据饥饿的机器学习(ML)算法的复杂性已增长。我们提出了一个灵活的系统,即功能监视(FM),该系统在此类数据集中检测数据漂移,并具有小而恒定的内存足迹,并且在流应用程序中的计算成本很小。该方法基于多变量统计测试,并且由设计驱动的数据(从数据中估算了完整的参考分布)。它监视系统使用的所有功能,同时每当发生警报时提供可解释的功能排名(以帮助根本原因分析)。系统的计算和记忆轻度是由于使用指数移动直方图而导致的。在我们的实验研究中,我们用其参数分析了系统的行为,更重要的是,它显示了它检测到与单个特征无直接相关的问题的示例。这说明了FM如何消除添加自定义信号以检测特定类型的问题的需求,并且监视可用功能空间通常是足够的。

Monitoring the behavior of automated real-time stream processing systems has become one of the most relevant problems in real world applications. Such systems have grown in complexity relying heavily on high dimensional input data, and data hungry Machine Learning (ML) algorithms. We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets, with a small and constant memory footprint and a small computational cost in streaming applications. The method is based on a multi-variate statistical test and is data driven by design (full reference distributions are estimated from the data). It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs (to aid in root cause analysis). The computational and memory lightness of the system results from the use of Exponential Moving Histograms. In our experimental study, we analyze the system's behavior with its parameters and, more importantly, show examples where it detects problems that are not directly related to a single feature. This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源