功能镜头：用于转换不变的视觉表示的即插即用的神经模块

论文标题

功能镜头：用于转换不变的视觉表示的即插即用的神经模块

Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations

论文作者

Li, Shaohua, Sui, Xiuchao, Fu, Jie, Liu, Yong, Goh, Rick Siow Mong

论文摘要

已知卷积神经网络（CNN）在各种图像转换下是脆弱的，包括旋转，尺度和照明条件的变化。我们观察到，转换图像的特征与原始图像的特征截然不同。为了使CNN更加不变，我们提出了“特征镜头”，这是一组临时模块，可以轻松地插入受过训练的模型（称为“主机模型”）。在特定转换下，每个单独的镜头都重建了转换图像的特征的原始特征。这些镜头共同抵消了由各种变换引起的特征扭曲，从而使宿主模型在不进行重新培训的情况下更强大。通过仅更新镜头，当训练数据中缺少新的转换时，主机模型可以摆脱迭代更新；由于保留了特征语义，因此下游应用（例如分类器和检测器）会自动获得稳健性而无需重新培训。镜片以自我监督的方式训练，没有注释，通过将镜头变换功能和原始功能之间的小说“ Top-K激活对比损失”最小化。在Imagenet，MNIST-ROT和CIFAR-10上进行了评估，特征镜片比基线方法具有明显的优势。

Convolutional Neural Networks (CNNs) are known to be brittle under various image transformations, including rotations, scalings, and changes of lighting conditions. We observe that the features of a transformed image are drastically different from the ones of the original image. To make CNNs more invariant to transformations, we propose "Feature Lenses", a set of ad-hoc modules that can be easily plugged into a trained model (referred to as the "host model"). Each individual lens reconstructs the original features given the features of a transformed image under a particular transformation. These lenses jointly counteract feature distortions caused by various transformations, thus making the host model more robust without retraining. By only updating lenses, the host model is freed from iterative updating when facing new transformations absent in the training data; as feature semantics are preserved, downstream applications, such as classifiers and detectors, automatically gain robustness without retraining. Lenses are trained in a self-supervised fashion with no annotations, by minimizing a novel "Top-K Activation Contrast Loss" between lens-transformed features and original features. Evaluated on ImageNet, MNIST-rot, and CIFAR-10, Feature Lenses show clear advantages over baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题