Wegformer：弱监督语义细分的变压器

论文标题

Wegformer：弱监督语义细分的变压器

WegFormer: Transformers for Weakly Supervised Semantic Segmentation

论文作者

Liu, Chunmeng, Xie, Enze, Wang, Wenjia, Wang, Wenhai, Li, Guangyao, Luo, Ping

论文摘要

尽管卷积神经网络（CNN）在弱监督的语义细分（WSSS）方面取得了显着进展，但CNN的有效接受场不足以捕获全球环境信息，从而导致次优结果。受到变压器在基本视觉领域的巨大成功的启发，这项工作首次引入了变形金刚，以构建一个简单有效的WSSS框架，称为Wegformer。与现有的基于CNN的方法不同，Wegformer使用Vision Transformer（VIT）作为分类器来产生高质量的伪分割掩码。为此，我们在基于变压器的框架中介绍了三个量身定制的组件，（1）深层泰勒分解（DTD）以生成注意图，（2）柔软的擦除模块，以平滑注意力图，以及（3）有效的潜在对象挖掘（EPOM），以过滤背景中的NOISY激活。没有任何铃铛和哨子，Wegformer可以在Pascal VOC数据集上实现最先进的70.5％MIOU，显着优于先前的最佳方法。我们希望Wegformer提供一种新的观点，可以利用变压器在弱监督的语义细分中的潜力。代码将发布。

Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results. Inspired by the great success of Transformers in fundamental vision areas, this work for the first time introduces Transformer to build a simple and effective WSSS framework, termed WegFormer. Unlike existing CNN-based methods, WegFormer uses Vision Transformer (ViT) as a classifier to produce high-quality pseudo segmentation masks. To this end, we introduce three tailored components in our Transformer-based framework, which are (1) a Deep Taylor Decomposition (DTD) to generate attention maps, (2) a soft erasing module to smooth the attention maps, and (3) an efficient potential object mining (EPOM) to filter noisy activation in the background. Without any bells and whistles, WegFormer achieves state-of-the-art 70.5% mIoU on the PASCAL VOC dataset, significantly outperforming the previous best method. We hope WegFormer provides a new perspective to tap the potential of Transformer in weakly supervised semantic segmentation. Code will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题