论文标题
通过同步动量分组的无监督视觉表示学习
Unsupervised Visual Representation Learning by Synchronous Momentum Grouping
论文作者
论文摘要
在本文中,我们提出了一种真正的群体级对比度视觉表示学习方法,其在Imagenet上的线性评估表现超过了香草的监督学习。两个主流的无监督学习方案是实例级对比框架和基于聚类的方案。前者采用了极为细粒度的实例级别歧视,由于虚假负面因素,其监督信号并不有效。尽管后者解决了这一点,但它们通常会受到影响性能的一些限制。为了整合他们的优势,我们设计了烟雾方法。烟雾遵循对比度学习的框架,但取代了对比度单元,从而模仿了基于聚类的方法。为了实现这一目标,我们提出了同步执行特征分组与表示学习的动量分组方案。通过这种方式,烟雾解决了基于聚类的方法通常面对的监督信号滞后问题,并减少了实例对比方法的错误负面因素。我们进行详尽的实验,以表明烟雾在CNN和变压器骨架上都很好。结果证明,烟雾已经超过了当前的SOTA无监督的表示方法。此外,其线性评估结果超过了通过香草监督学习获得的性能,并且可以很好地转移到下游任务。
In this paper, we propose a genuine group-level contrastive visual representation learning method whose linear evaluation performance on ImageNet surpasses the vanilla supervised learning. Two mainstream unsupervised learning schemes are the instance-level contrastive framework and clustering-based schemes. The former adopts the extremely fine-grained instance-level discrimination whose supervisory signal is not efficient due to the false negatives. Though the latter solves this, they commonly come with some restrictions affecting the performance. To integrate their advantages, we design the SMoG method. SMoG follows the framework of contrastive learning but replaces the contrastive unit from instance to group, mimicking clustering-based methods. To achieve this, we propose the momentum grouping scheme which synchronously conducts feature grouping with representation learning. In this way, SMoG solves the problem of supervisory signal hysteresis which the clustering-based method usually faces, and reduces the false negatives of instance contrastive methods. We conduct exhaustive experiments to show that SMoG works well on both CNN and Transformer backbones. Results prove that SMoG has surpassed the current SOTA unsupervised representation learning methods. Moreover, its linear evaluation results surpass the performances obtained by vanilla supervised learning and the representation can be well transferred to downstream tasks.