CIGAN：使用生成对抗网络处理类不平衡的Python软件包

论文标题

CIGAN：使用生成对抗网络处理类不平衡的Python软件包

CIGAN: A Python Package for Handling Class Imbalance using Generative Adversarial Networks

论文作者

Huang, Yuxiao, Ma, Yan

论文摘要

机器学习中的一个关键挑战是类失衡，其中某些类（多数类）的样本量远高于其他类别（少数类）。如果我们要直接在不平衡数据上培训分类器，则分类器更有可能将新样本预测为多数类。在极端情况下，分类器可以完全忽略少数群体。这可能在医疗保健中具有严重的社会学意义，因为少数群体通常是疾病类别（例如，死亡或临床检测阳性结果）。在本文中，我们介绍了一种软件，该软件使用生成的对抗网络来超过少数群体，以改善下游分类。据我们所知，这是允许多类分类的第一个工具（目标可以具有任意数量的类）。该工具的代码在我们的GitHub存储库中公开可用（https://github.com/yuxiaohuang/research/tree/master/master/gwu/working/cigan/code/code）。

A key challenge in Machine Learning is class imbalance, where the sample size of some classes (majority classes) are much higher than that of the other classes (minority classes). If we were to train a classifier directly on imbalanced data, it is more likely for the classifier to predict a new sample as one of the majority classes. In the extreme case, the classifier could completely ignore the minority classes. This could have serious sociological implications in healthcare, as the minority classes are usually the disease classes (e.g., death or positive clinical test result). In this paper, we introduce a software that uses Generative Adversarial Networks to oversample the minority classes so as to improve downstream classification. To the best of our knowledge, this is the first tool that allows multi-class classification (where the target can have an arbitrary number of classes). The code of the tool is publicly available in our github repository (https://github.com/yuxiaohuang/research/tree/master/gwu/working/cigan/code).

下载PDF全文

下载文献需遵守相关版权规定

论文标题