重新考虑对比学习中的增强模块：学习层次增强不变性与扩展的观点

论文标题

重新考虑对比学习中的增强模块：学习层次增强不变性与扩展的观点

Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views

论文作者

Zhang, Junbo, Ma, Kaisheng

论文摘要

数据增强模块用于对比学习将给定的数据示例转换为两个视图，这被认为是必不可少的且不可替代的视图。但是，多个数据增强的预定组成带来了两个缺点。首先，人工选择增强类型会给模型带来特定的代表性不变，后者对不同的下游任务具有不同程度的积极和负面影响。在训练期间，平等处理每种类型的增强性，使该模型学习了各种下游任务的非最佳表示，并限制了事先选择增强类型的灵活性。其次，在经典的对比度学习方法中使用的强大数据增加可能会在某些情况下带来太多的不变性，而对于某些下游任务至关重要的细粒度可能会丢失。本文提出了一种通用方法，以考虑在一般的对比学习框架中考虑在哪里以及与什么对比来减轻这两个问题。我们首先建议根据每个数据增强的重要性，在模型的不同深度学习不同的增强不变，而不是在骨架中均匀地学习代表性不变。然后，我们建议用增强嵌入扩展对比度内容，以减少强大数据增强的误导效果。基于几种基线方法的实验表明，我们在分类，检测和分割下游任务方面学习更好的表示基准。

A data augmentation module is utilized in contrastive learning to transform the given data example into two views, which is considered essential and irreplaceable. However, the predetermined composition of multiple data augmentations brings two drawbacks. First, the artificial choice of augmentation types brings specific representational invariances to the model, which have different degrees of positive and negative effects on different downstream tasks. Treating each type of augmentation equally during training makes the model learn non-optimal representations for various downstream tasks and limits the flexibility to choose augmentation types beforehand. Second, the strong data augmentations used in classic contrastive learning methods may bring too much invariance in some cases, and fine-grained information that is essential to some downstream tasks may be lost. This paper proposes a general method to alleviate these two problems by considering where and what to contrast in a general contrastive learning framework. We first propose to learn different augmentation invariances at different depths of the model according to the importance of each data augmentation instead of learning representational invariances evenly in the backbone. We then propose to expand the contrast content with augmentation embeddings to reduce the misleading effects of strong data augmentations. Experiments based on several baseline methods demonstrate that we learn better representations for various benchmarks on classification, detection, and segmentation downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题