密集的暹罗网络，用于密集的无监督学习

论文标题

密集的暹罗网络，用于密集的无监督学习

Dense Siamese Network for Dense Unsupervised Learning

论文作者

Zhang, Wenwei, Pang, Jiangmiao, Chen, Kai, Loy, Chen Change

论文摘要

本文介绍了密集的暹罗网络（Denseiam），这是一个简单的无监督学习框架，用于密集的预测任务。它通过以两种类型的一致性，即像素一致性和区域一致性来最大化一个图像的两个视图之间的相似性来学习视觉表示。具体地，根据重叠区域中的确切位置对应关系，Denseiam首先最大化像素级的空间一致性。它还提取了一批与重叠区域中某些子区域相对应的区域嵌入，以形成区域一致性。与以前需要负像素对，动量编码器或启发式面膜的方法相反，Denseiam受益于简单的暹罗网络，并优化了不同粒度的一致性。它还证明了简单的位置对应关系和相互作用的区域嵌入足以学习相似性。我们在ImageNet上应用Denseiam，并在各种下游任务上获得竞争改进。我们还表明，只有在一些特定于任务的损失中，简单的框架才能直接执行密集的预测任务。在现有的无监督语义细分基准中，它以2.1 miou的成本超过了最新的细分方法，培训成本为28％。代码和型号在https://github.com/zwwwayne/densesiam上发布。

This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks. It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency. Concretely, DenseSiam first maximizes the pixel level spatial consistency according to the exact location correspondence in the overlapped area. It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency. In contrast to previous methods that require negative pixel pairs, momentum encoders or heuristic masks, DenseSiam benefits from the simple Siamese network and optimizes the consistency of different granularities. It also proves that the simple location correspondence and interacted region embeddings are effective enough to learn the similarity. We apply DenseSiam on ImageNet and obtain competitive improvements on various downstream tasks. We also show that only with some extra task-specific losses, the simple framework can directly conduct dense prediction tasks. On an existing unsupervised semantic segmentation benchmark, it surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs. Code and models are released at https://github.com/ZwwWayne/DenseSiam.

下载PDF全文

下载文献需遵守相关版权规定

论文标题