深综合少数群体过度采样技术

论文标题

深综合少数群体过度采样技术

Deep Synthetic Minority Over-Sampling Technique

论文作者

Mansourifar, Hadi, Shi, Weidong

论文摘要

合成少数群体过采样技术（SMOTE）是最流行的过度采样方法。但是，其随机性质使综合数据甚至分类结果不稳定。这意味着，在运行smote n不同时间的情况下，n个不同的固定式现场是通过n个不同的分类结果获得的。为了解决这个问题，我们在深度学习体系结构中调整了Smote的想法。在这种方法中，使用深层神经网络回归模型来训练传统Smote的输入和输出。所提出的深度回归模型的输入是两个随机选择的数据点，这些数据点被串联以形成双尺寸向量。该模型的输出是在两个随机选择的具有原始维度的随机选择向量之间的随机插值数据点。实验结果表明，在大多数测试用例中，Deep Smote可以在曲线下的精度，F1分数和面积（AUC）方面胜过传统的Smote。

Synthetic Minority Over-sampling Technique (SMOTE) is the most popular over-sampling method. However, its random nature makes the synthesized data and even imbalanced classification results unstable. It means that in case of running SMOTE n different times, n different synthesized in-stances are obtained with n different classification results. To address this problem, we adapt the SMOTE idea in deep learning architecture. In this method, a deep neural network regression model is used to train the inputs and outputs of traditional SMOTE. Inputs of the proposed deep regression model are two randomly chosen data points which are concatenated to form a double size vector. The outputs of this model are corresponding randomly interpolated data points between two randomly chosen vectors with original dimension. The experimental results show that, Deep SMOTE can outperform traditional SMOTE in terms of precision, F1 score and Area Under Curve (AUC) in majority of test cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题