论文标题

对不同分类器的实证研究,使用业务流程事件日志进行下一个事件预测的编码和集合方案

An empirical investigation of different classifiers, encoding and ensemble schemes for next event prediction using business process event logs

论文作者

Tama, Bayu Adhi, Comuzzi, Marco, Ko, Jonghyeon

论文摘要

对经验基准的需求日益增长,可以支持研究人员和从业人员为给定的预测任务选择最佳的机器学习技术。在本文中,我们考虑了业务流程预测性监视中的下一个事件预测任务,并通过研究对不同编码窗口的性能和使用集合方案的性能的影响来扩展先前发布的基准。是否使用合奏以及使用哪种方案的选择通常取决于数据类型和分类任务。尽管有一个普遍的理解,即合奏在对业务流程的预测监视中表现良好,但下一个事件预测是一项任务,没有其他涉及集合的基准测试。拟议的基准测试人员可以帮助研究人员在所考虑的事件日志的情况下,选择高性能的单个分类器或集成方案。实验结果表明,为特征编码选择最佳的事件数量很具有挑战性,从而导致在选择最佳值时需要单独考虑每个事件日志。集成方案在此任务中提高了低性能分类器的性能,例如SVM,而高性能分类器(例如基于树的分类器)在考虑集成方案时并没有更好。

There is a growing need for empirical benchmarks that support researchers and practitioners in selecting the best machine learning technique for given prediction tasks. In this paper, we consider the next event prediction task in business process predictive monitoring and we extend our previously published benchmark by studying the impact on the performance of different encoding windows and of using ensemble schemes. The choice of whether to use ensembles and which scheme to use often depends on the type of data and classification task. While there is a general understanding that ensembles perform well in predictive monitoring of business processes, next event prediction is a task for which no other benchmarks involving ensembles are available. The proposed benchmark helps researchers to select a high performing individual classifier or ensemble scheme given the variability at the case level of the event log under consideration. Experimental results show that choosing an optimal number of events for feature encoding is challenging, resulting in the need to consider each event log individually when selecting an optimal value. Ensemble schemes improve the performance of low performing classifiers in this task, such as SVM, whereas high performing classifiers, such as tree-based classifiers, are not better off when ensemble schemes are considered.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源