论文标题
部分可观测时空混沌系统的无模型预测
Masked Siamese ConvNets
论文作者
论文摘要
自我监督的学习表明,在各种视觉基准上,表现出优于监督方法。暹罗网络鼓励嵌入是扭曲的不变,是最成功的自我观察的视觉表示学习方法之一。在所有增强方法中,掩蔽是最通用,最直接的方法,它有可能应用于各种输入,需要最少的域知识。但是,蒙面的暹罗网络需要特殊的归纳偏见,并且实际上仅与视觉变压器搭配得很好。这项工作从经验上研究了用Convnets掩盖暹罗网络背后的问题。我们提出了几种经验设计,以逐渐克服这些问题。我们的方法在低射击图像分类方面具有竞争力,并且在对象检测基准测试基准方面优于先前的方法。我们讨论了剩余的几个问题,希望这项工作可以为未来的通用自我监督学习提供有用的数据点。
Self-supervised learning has shown superior performances over supervised methods on various vision benchmarks. The siamese network, which encourages embeddings to be invariant to distortions, is one of the most successful self-supervised visual representation learning approaches. Among all the augmentation methods, masking is the most general and straightforward method that has the potential to be applied to all kinds of input and requires the least amount of domain knowledge. However, masked siamese networks require particular inductive bias and practically only work well with Vision Transformers. This work empirically studies the problems behind masked siamese networks with ConvNets. We propose several empirical designs to overcome these problems gradually. Our method performs competitively on low-shot image classification and outperforms previous methods on object detection benchmarks. We discuss several remaining issues and hope this work can provide useful data points for future general-purpose self-supervised learning.