论文标题
关于对象地标表示的模棱两可的学习
On Equivariant and Invariant Learning of Object Landmark Representations
论文作者
论文摘要
给定图像的集合,人类能够通过对跨实例进行建模共享几何结构来发现地标。这种几何模棱两可的想法已被广泛用于无监督的目标地标表示。在本文中,我们通过结合实例歧视性和空间歧视性对比学习来开发一种简单有效的方法。我们表明,当训练深层网络是对几何和光度变换不变的时,从其高度预测对象地标的中间层出现了表示形式。在“超柱”中堆叠这些层,并使用空间对抗性学习进一步提高其在匹配和少量地标回归任务上的性能。我们还通过对比度学习的角度展示了现有的模棱两可和不变的代表性学习方法的统一观点,从而阐明了所学性的不变性的性质。关于地标学习的标准基准测试以及我们提出的新挑战性的实验表明,所提出的方法超过了先前的最新方法。
Given a collection of images, humans are able to discover landmarks by modeling the shared geometric structure across instances. This idea of geometric equivariance has been widely used for the unsupervised discovery of object landmark representations. In this paper, we develop a simple and effective approach by combining instance-discriminative and spatially-discriminative contrastive learning. We show that when a deep network is trained to be invariant to geometric and photometric transformations, representations emerge from its intermediate layers that are highly predictive of object landmarks. Stacking these across layers in a "hypercolumn" and projecting them using spatially-contrastive learning further improves their performance on matching and few-shot landmark regression tasks. We also present a unified view of existing equivariant and invariant representation learning approaches through the lens of contrastive learning, shedding light on the nature of invariances learned. Experiments on standard benchmarks for landmark learning, as well as a new challenging one we propose, show that the proposed approach surpasses prior state-of-the-art.