选择器：选择代表性基准套件进行可重复的统计比较

论文标题

选择器：选择代表性基准套件进行可重复的统计比较

SELECTOR: Selecting a Representative Benchmark Suite for Reproducible Statistical Comparison

论文作者

Cenikj, Gjorgjina, Lang, Ryan Dieter, Engelbrecht, Andries Petrus, Doerr, Carola, Korošec, Peter, Eftimov, Tome

论文摘要

公平算法评估的条件是基于非冗余的高质量基准数据集的存在，并且代表了典型的优化方案。在本文中，我们评估了三种启发式方法，以选择各种问题实例，这些实例应参与优化算法的比较，以确保稳健的统计算法性能分析。第一种方法采用聚类来识别相似的问题实例组，并从每个群集中进行随后的采样来构建新的基准测试，而其他两种方法则使用图形算法来识别主导和最大的独立节点。我们通过对五个最常用的优化基准的三种优化算法进行五个投资组合的统计绩效分析来证明拟议的启发式方法的适用性。结果表明，分别在每个基准上进行的算法性能的统计分析产生了相互矛盾的结果，可用于错误地表明一种算法比另一种算法的优越性。另一方面，当对提出的启发式方法选择的问题实例进行分析时，统一涵盖了问题景观时，统计结果是强大且一致的。

Fair algorithm evaluation is conditioned on the existence of high-quality benchmark datasets that are non-redundant and are representative of typical optimization scenarios. In this paper, we evaluate three heuristics for selecting diverse problem instances which should be involved in the comparison of optimization algorithms in order to ensure robust statistical algorithm performance analysis. The first approach employs clustering to identify similar groups of problem instances and subsequent sampling from each cluster to construct new benchmarks, while the other two approaches use graph algorithms for identifying dominating and maximal independent sets of nodes. We demonstrate the applicability of the proposed heuristics by performing a statistical performance analysis of five portfolios consisting of three optimization algorithms on five of the most commonly used optimization benchmarks. The results indicate that the statistical analyses of the algorithms' performance, conducted on each benchmark separately, produce conflicting outcomes, which can be used to give a false indication of the superiority of one algorithm over another. On the other hand, when the analysis is conducted on the problem instances selected with the proposed heuristics, which uniformly cover the problem landscape, the statistical outcomes are robust and consistent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题