长尾数据的功能空间增加

论文标题

长尾数据的功能空间增加

Feature Space Augmentation for Long-Tailed Data

论文作者

Chu, Peng, Bian, Xiao, Liu, Shaopeng, Ling, Haibin

论文摘要

实际上，实际的数据通常遵循长尾巴分布，因为每个类的频率通常不同。例如，数据集可以拥有大量代表性的类别和几个类，这些类具有足够的数据。但是，通常预计代表数据集的模型在整个类中具有合理的均匀性能。在数据重新采样和增强方面引入类平衡的损失和高级方法是减轻数据不平衡问题的最佳实践之一。但是，关于代表性不足的类别的问题的另一部分将必须依靠其他知识来恢复缺失的信息。在这项工作中，我们提出了一种新颖的方法来解决长尾问题，通过增强功能空间中代表性不足的类别，并通过从类中学到的功能提供了丰富的样本。特别是，我们将每个类的特征分解为使用类激活图的类生成组件和特定于类的组件。然后在训练阶段即时生成了代表性不足类的新型样本，从而通过将代表性不足类的特定于类的特定功能与混乱的类别的类型特征融合在一起，从而在训练阶段即时生成。我们在不同数据集上的结果，例如inaturalist，imagenet-lt，ploce-lt和长尾版本的CIFAR，已显示出最先进的表演状态。

Real-world data often follow a long-tailed distribution as the frequency of each class is typically different. For example, a dataset can have a large number of under-represented classes and a few classes with more than sufficient data. However, a model to represent the dataset is usually expected to have reasonably homogeneous performances across classes. Introducing class-balanced loss and advanced methods on data re-sampling and augmentation are among the best practices to alleviate the data imbalance problem. However, the other part of the problem about the under-represented classes will have to rely on additional knowledge to recover the missing information. In this work, we present a novel approach to address the long-tailed problem by augmenting the under-represented classes in the feature space with the features learned from the classes with ample samples. In particular, we decompose the features of each class into a class-generic component and a class-specific component using class activation maps. Novel samples of under-represented classes are then generated on the fly during training stages by fusing the class-specific features from the under-represented classes with the class-generic features from confusing classes. Our results on different datasets such as iNaturalist, ImageNet-LT, Places-LT and a long-tailed version of CIFAR have shown the state of the art performances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题