论文标题
一种元启发式多目标互动感知特征选择方法
A metaheuristic multi-objective interaction-aware feature selection method
论文作者
论文摘要
多目标特征选择是模式识别领域中最重要的问题之一。这是具有挑战性的,因为它可以最大化分类性能,同时使所选功能的数量最小化,并且提到的两个目标通常是冲突的。为了获得更好的帕累托最佳解决方案,在许多研究中广泛使用了元启发式优化方法。但是,主要缺点是探索大型搜索空间。多目标特征选择方法的另一个问题是功能之间的相互作用。选择相关特征对分类性能具有负面影响。为了解决这些问题,我们提出了一种具有多个优点的新型多目标特征选择方法。首先,它考虑使用高级概率方案之间的功能之间的相互作用。其次,它基于Pareto存档的演化策略(PAES)方法,该方法具有多个优点,例如简单性及其探索解决方案空间的速度。但是,我们以一种聪明地生成后代的方式来改善PAE的结构。因此,提出的方法利用引入的概率方案来产生更有希望的后代。最后,它配备了一种新型策略,可以通过进化过程来指导其找到最佳的特征数量。实验结果表明,与不同现实世界数据集中的最新方法相比,找到最佳帕累托前沿的改善。
Multi-objective feature selection is one of the most significant issues in the field of pattern recognition. It is challenging because it maximizes the classification performance and, at the same time, minimizes the number of selected features, and the mentioned two objectives are usually conflicting. To achieve a better Pareto optimal solution, metaheuristic optimization methods are widely used in many studies. However, the main drawback is the exploration of a large search space. Another problem with multi-objective feature selection approaches is the interaction between features. Selecting correlated features has negative effect on classification performance. To tackle these problems, we present a novel multi-objective feature selection method that has several advantages. Firstly, it considers the interaction between features using an advanced probability scheme. Secondly, it is based on the Pareto Archived Evolution Strategy (PAES) method that has several advantages such as simplicity and its speed in exploring the solution space. However, we improve the structure of PAES in such a way that generates the offsprings, intelligently. Thus, the proposed method utilizes the introduced probability scheme to produce more promising offsprings. Finally, it is equipped with a novel strategy that guides it to find the optimum number of features through the process of evolution. The experimental results show a significant improvement in finding the optimal Pareto front compared to state-of-the-art methods on different real-world datasets.