论文标题

自适应邻里分区全面有条件的相互信息最大化方法用于特征选择

An Adaptive Neighborhood Partition Full Conditional Mutual Information Maximization Method for Feature Selection

论文作者

Wang, Gaoshuai, Lauri, Fabrice, Wang, Pu, Luo, Hongyuan, Hassani, Amir Hajjam lL

论文摘要

功能选择用于消除冗余功能并保持相关功能,它可以增强机器学习算法的性能和加速计算速度。在各种方法中,相互信息引起了越来越多的关注,因为它是测量可变相关性的有效标准。但是,当前的作品主要集中于最大化特征与类标签的功能相关性,并最大程度地减少所选功能中的功能冗余,我们认为追求功能冗余最小化是合理的,但不必要,因为所谓的冗余功能的一部分也带有一些有用的信息来促进性能。就相互信息计算而言,它可能会扭曲两个没有适当邻域分区的变量之间的真实关系。传统方法通常将连续变量分为几个间隔,甚至忽略了这种影响。从理论上讲,我们证明可变波动如何负面影响相互信息计算。为了消除引用的障碍,对于特征选择方法,我们提出了一种完整的有条件互信息最大化方法(FCMIM),该方法仅在两个方面考虑特征相关性。为了获得更好的分区效果并消除了属性波动的负面影响,我们使用了一种适应性的邻里分区算法(ANP),并使用共同信息最大化算法的反馈,后反向传播过程有助于搜索适当的邻里分区参数。我们将方法与17个基准数据集的几种共同信息方法进行了比较。 FCMIM的结果比基于不同分类器的其他方法更好。结果表明,ANP确实促进了几乎所有共同信息方法的性能。

Feature selection is used to eliminate redundant features and keep relevant features, it can enhance machine learning algorithm's performance and accelerate computing speed. In various methods, mutual information has attracted increasingly more attention as it's an effective criterion to measure variable correlation. However, current works mainly focus on maximizing the feature relevancy with class label and minimizing the feature redundancy within selected features, we reckon that pursuing feature redundancy minimization is reasonable but not necessary because part of so-called redundant features also carries some useful information to promote performance. In terms of mutual information calculation, it may distort the true relationship between two variables without proper neighborhood partition. Traditional methods usually split the continuous variables into several intervals even ignore such influence. We theoretically prove how variable fluctuation negatively influences mutual information calculation. To remove the referred obstacles, for feature selection method, we propose a full conditional mutual information maximization method (FCMIM) which only considers the feature relevancy in two aspects. For obtaining a better partition effect and eliminating the negative influence of attribute fluctuation, we put up an adaptive neighborhood partition algorithm (ANP) with the feedback of mutual information maximization algorithm, the backpropagation process helps search for a proper neighborhood partition parameter. We compare our method with several mutual information methods on 17 benchmark datasets. Results of FCMIM are better than other methods based on different classifiers. Results show that ANP indeed promotes nearly all the mutual information methods' performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源