空间交叉注意改善自我监督的视觉表示学习

论文标题

空间交叉注意改善自我监督的视觉表示学习

Spatial Cross-Attention Improves Self-Supervised Visual Representation Learning

论文作者

Seyfi, Mehdi, Banitalebi-Dehkordi, Amin, Zhang, Yong

论文摘要

事实证明，无监督的表示学习方法在学习目标数据集的视觉语义方面是有效的。这些方法背后的主要思想是，同一图像的不同视图代表相同的语义。在本文中，我们进一步引入了一个附加模块，以促进对样本之间空间跨相关性的知识注入。反过来，这导致了类内部信息，包括特征级别的位置和同类实例之间的相似性。可以将所提出的附加组件添加到现有方法中，例如SWAV。稍后，我们可以删除用于推理的附加模块，而无需修改学识的权重。通过一系列广泛的经验评估，我们验证我们的方法在检测类激活图，TOP-1分类精度和下游任务（例如对象检测）中具有改进的性能，并具有不同的配置设置。

Unsupervised representation learning methods like SwAV are proved to be effective in learning visual semantics of a target dataset. The main idea behind these methods is that different views of a same image represent the same semantics. In this paper, we further introduce an add-on module to facilitate the injection of the knowledge accounting for spatial cross correlations among the samples. This in turn results in distilling intra-class information including feature level locations and cross similarities between same-class instances. The proposed add-on can be added to existing methods such as the SwAV. We can later remove the add-on module for inference without any modification of the learned weights. Through an extensive set of empirical evaluations, we verify that our method yields an improved performance in detecting the class activation maps, top-1 classification accuracy, and down-stream tasks such as object detection, with different configuration settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题