跨语言图像匹配弱监督的语义细分

论文标题

跨语言图像匹配弱监督的语义细分

Cross Language Image Matching for Weakly Supervised Semantic Segmentation

论文作者

Xie, Jinheng, Hou, Xianxu, Ye, Kai, Shen, Linlin

论文摘要

众所周知，CAM（类激活图）通常仅激活歧视对象区域，并且错误地包含许多与对象相关的背景。由于仅适用于WSSS（弱监督语义分割）模型的一组固定的图像级对象标签，因此很难抑制由开放式设置对象组成的那些不同的背景区域。在本文中，我们提出了一个新型的跨语言图像匹配（攀登）框架，该框架是基于最近引入的WSSS的最近引入的对比性语言图像预训练（剪辑）模型。我们框架的核心思想是引入自然语言监督，以激活更完整的对象区域并抑制密切相关的开放背景区域。特别是，我们设计对象，背景区域和文本标签匹配损失，以指导模型激发每个类别的CAM的更合理的对象区域。此外，我们设计了同时出现的背景抑制损失，以防止模型激活与班级相关的背景文本描述，以激活密切相关的背景区域。这些设计使提出的攀登能够为目标对象生成更完整和紧凑的激活图。 Pascal VOC2012数据集的广泛实验表明，我们的攀登大大优于先前的最新方法。

It has been widely known that CAM (Class Activation Map) usually only activates discriminative object regions and falsely includes lots of object-related backgrounds. As only a fixed set of image-level object labels are available to the WSSS (weakly supervised semantic segmentation) model, it could be very difficult to suppress those diverse background regions consisting of open set objects. In this paper, we propose a novel Cross Language Image Matching (CLIMS) framework, based on the recently introduced Contrastive Language-Image Pre-training (CLIP) model, for WSSS. The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress closely-related open background regions. In particular, we design object, background region and text label matching losses to guide the model to excite more reasonable object regions for CAM of each category. In addition, we design a co-occurring background suppression loss to prevent the model from activating closely-related background regions, with a predefined set of class-related background text descriptions. These designs enable the proposed CLIMS to generate a more complete and compact activation map for the target objects. Extensive experiments on PASCAL VOC2012 dataset show that our CLIMS significantly outperforms the previous state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题