细粒度的视觉分类，并有效地端到端本地化

论文标题

细粒度的视觉分类，并有效地端到端本地化

Fine-Grained Visual Classification with Efficient End-to-end Localization

论文作者

Hanselmann, Harald, Ney, Hermann

论文摘要

术语“细颗粒视觉分类（FGVC）”是指类非常相似的分类任务，并且分类模型需要能够找到微妙的差异以做出正确的预测。最先进的方法通常包括一个本地化步骤，旨在通过定位输入图像的相关部分来帮助分类网络。但是，这通常需要多次迭代或通过完整的分类网络或复杂的培训时间表。在这项工作中，我们提出了一个有效的本地化模块，该模块可以与端到端设置中的分类网络融合。一方面，该模块是通过从分类网络流回的梯度训练的。另一方面，引入了两个自我监督的损失功能，以提高本地化精度。我们在三个基准数据集Cub200-2011，Stanford Cars和FGVC-Aircraft上评估了新模型，并能够实现竞争性识别性能。

The term fine-grained visual classification (FGVC) refers to classification tasks where the classes are very similar and the classification model needs to be able to find subtle differences to make the correct prediction. State-of-the-art approaches often include a localization step designed to help a classification network by localizing the relevant parts of the input images. However, this usually requires multiple iterations or passes through a full classification network or complex training schedules. In this work we present an efficient localization module that can be fused with a classification network in an end-to-end setup. On the one hand the module is trained by the gradient flowing back from the classification network. On the other hand, two self-supervised loss functions are introduced to increase the localization accuracy. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft and are able to achieve competitive recognition performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题