论文标题
Stream-Learn-开源Python库,用于困难数据流批处理分析
stream-learn -- open-source Python library for difficult data stream batch analysis
论文作者
论文摘要
Stream-Learn是与Scikit-Learn兼容的Python软件包,并开发用于漂移和不平衡的数据流分析。它的主要组件是流生成器,它允许产生一个合成数据流,该数据流可以包含三种主要概念漂移类型(即突然,逐渐和增量漂移)中的每一种,它们的经常性或非经常版本。该软件包允许在既定的评估方法(即测试训练和术前进行测试)进行实验。此外,已经实施了适合数据流分类的估计器,包括简单的分类器和基于最先进的基于块的和在线分类器集合。为了提高计算效率,软件包利用其自己的预测指标实现,用于不平衡的二进制分类任务。
stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.