论文标题
不同度量选择用于概念漂移检测的适用性
Suitability of Different Metric Choices for Concept Drift Detection
论文作者
论文摘要
概念漂移的概念是指在观察到的数据下的分布随时间变化的现象。结果,机器学习模型可能会变得不准确,需要调整。许多无监督的漂移检测方法都取决于测量两个时间窗口的样本分布之间的差异。在进行一些预处理(特征提取,嵌入潜在空间等)或相对于推断的特征(均值,方差,条件概率等)之后,可以直接进行此操作。大多数漂移检测方法可以通过它们使用的指标,如何估计该指标以及如何找到决策阈值来区分。在本文中,我们分析了在不同指标的背景下漂移引起的信号的结构特性。我们从理论和经验上比较了不同类型的估计器和指标,并研究了单个度量组件的相关性。此外,我们提出了新的选择,并在几个实验中证明了它们的适用性。
The notion of concept drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time; as a consequence machine learning models may become inaccurate and need adjustment. Many unsupervised approaches for drift detection rely on measuring the discrepancy between the sample distributions of two time windows. This may be done directly, after some preprocessing (feature extraction, embedding into a latent space, etc.), or with respect to inferred features (mean, variance, conditional probabilities etc.). Most drift detection methods can be distinguished in what metric they use, how this metric is estimated, and how the decision threshold is found. In this paper, we analyze structural properties of the drift induced signals in the context of different metrics. We compare different types of estimators and metrics theoretically and empirically and investigate the relevance of the single metric components. In addition, we propose new choices and demonstrate their suitability in several experiments.