论文标题
具有少数族裔和多数类的合成过度采样方法,以解决不平衡问题
A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems
论文作者
论文摘要
阶级不平衡是对许多现实情况进行分类的重大挑战。综合过度采样方法已有效地提高分类器的性能在不平衡问题上的性能。但是,大多数综合过度采样方法都会在现有少数族裔实例形成的凸船体内产生非多样性的合成实例,因为它们仅集中在少数群体上,而忽略了多数族类提供的大量信息。由于少数族裔实例越少,因此它们的表现越少,生成合成实例的信息就越少,它们通常也表现不佳。此外,在多数类具有多模式分布时,使用多数类分布信息生成合成实例的现有方法无法有效地执行。我们提出了一种新方法,以使用少数族裔和多数类(SOMM)的合成过度采样(SOMM)生成多样化和适应性的合成实例。 Somm在少数数据空间内生成多样化的综合实例。它可以自适应地更新生成的实例,包括两个类别。因此,SOMM在二进制和多类不平衡问题上都表现良好。我们使用基准数据集以不同的不平衡水平来检查SOMM的SOMM性能。经验结果表明,与其他现有方法相比,SOMM的优势。
Class imbalance is a substantial challenge in classifying many real-world cases. Synthetic over-sampling methods have been effective to improve the performance of classifiers for imbalance problems. However, most synthetic over-sampling methods generate non-diverse synthetic instances within the convex hull formed by the existing minority instances as they only concentrate on the minority class and ignore the vast information provided by the majority class. They also often do not perform well for extremely imbalanced data as the fewer the minority instances, the less information to generate synthetic instances. Moreover, existing methods that generate synthetic instances using the majority class distributional information cannot perform effectively when the majority class has a multi-modal distribution. We propose a new method to generate diverse and adaptable synthetic instances using Synthetic Over-sampling with Minority and Majority classes (SOMM). SOMM generates synthetic instances diversely within the minority data space. It updates the generated instances adaptively to the neighbourhood including both classes. Thus, SOMM performs well for both binary and multiclass imbalance problems. We examine the performance of SOMM for binary and multiclass problems using benchmark data sets for different imbalance levels. The empirical results show the superiority of SOMM compared to other existing methods.