equimod：一个模块用于改善自我监督学习的模块

论文标题

equimod：一个模块用于改善自我监督学习的模块

EquiMod: An Equivariance Module to Improve Self-Supervised Learning

论文作者

Devillers, Alexandre, Lefort, Mathieu

论文摘要

自我监督的视觉表示方法正在通过监督的学习绩效缩小差距。这些方法依赖于通过数据增强创建的相关合成输入的嵌入之间的相似性。这可以看作是鼓励嵌入将这些增强所修改的因素（即对它们不变的）所修改的因素的任务。但是，这仅考虑在选择增强时进行权衡的一方面：他们需要强烈修改图像以避免简单的解决方案快捷方式学习（例如，仅使用颜色直方图），但是另一方面，与增强相关的信息可能缺乏与某些下游任务的表述中的相关信息（例如，颜色对鸟类和花分类很重要）。最近很少有一些著作提出了通过探索某种形式的均衡性来减轻不变任务的问题。这是通过学习其他嵌入空间来执行的，其中一些增强导致嵌入不同，但以非控制的方式进行了不同。在这项工作中，我们介绍了Equimod一个通用的肩variance模块，该模块构建了学到的潜在空间，从某种意义上说，我们的模块学会了预测由增强物引起的嵌入空间中的位移。我们表明，将该模块应用于SIMCLR和BYOL等最新的不变性模型，会增加CIFAR10和Imagenet数据集上的性能。此外，尽管我们的模型可能会崩溃到微不足道的均值，即不变性，但我们观察到它会自动学习以保持一些与增强相关的信息对表示形式有益。

Self-supervised visual representation methods are closing the gap with supervised learning performance. These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations. This can be seen as a task that encourages embeddings to leave out factors modified by these augmentations, i.e. to be invariant to them. However, this only considers one side of the trade-off in the choice of the augmentations: they need to strongly modify the images to avoid simple solution shortcut learning (e.g. using only color histograms), but on the other hand, augmentations-related information may be lacking in the representations for some downstream tasks (e.g. color is important for birds and flower classification). Few recent works proposed to mitigate the problem of using only an invariance task by exploring some form of equivariance to augmentations. This has been performed by learning additional embeddings space(s), where some augmentation(s) cause embeddings to differ, yet in a non-controlled way. In this work, we introduce EquiMod a generic equivariance module that structures the learned latent space, in the sense that our module learns to predict the displacement in the embedding space caused by the augmentations. We show that applying that module to state-of-the-art invariance models, such as SimCLR and BYOL, increases the performances on CIFAR10 and ImageNet datasets. Moreover, while our model could collapse to a trivial equivariance, i.e. invariance, we observe that it instead automatically learns to keep some augmentations-related information beneficial to the representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题