扩张的卷积和横向抑制作用以进行语义图像分割

论文标题

扩张的卷积和横向抑制作用以进行语义图像分割

Dilated Convolutions with Lateral Inhibitions for Semantic Image Segmentation

论文作者

Wang, Yujiang, Dong, Mingzhi, Shen, Jie, Lin, Yiming, Pantic, Maja

论文摘要

扩张的卷积被广泛用于深度语义分割模型中，因为它们可以扩大过滤器的接受场而不增加额外的权重或牺牲空间分辨率。但是，由于扩张的卷积过滤器没有关于语义有意义的轮廓上像素的位置知识，因此它们可能会导致对物体边界的模棱两可的预测。此外，尽管扩张过滤器可以扩大其接受场，但采样像素的总数保持不变，通常占接受场总面积的一小部分。受到人类视觉系统中横向抑制（LI）机制的启发，我们提出了使用横向抑制（LI-CONV）的扩张卷积，以克服这些局限性。引入LI机制可以提高卷积过滤器对语义对象边界的敏感性。此外，由于LI-CONV还隐式地将像素从侧面抑制区域中考虑到考虑因素，因此它们还可以以较密集的规模提取特征。通过将LI-CONV集成到DEEPLABV3+结构中，我们提出了横向抑制的残留空间金字塔池（LI-ASPP），横向抑制Mobilenet-V2（Li-MNV2）（LI-MNV2）和侧向抑制的RESNET（LI-RESNET）。在三个基准数据集（Pascal VOC 2012，Celebamask-HQ和ADE20K）上进行的实验结果表明，我们基于LI的分段模型的表现优于所有基线，从而验证所提出的LI-CONV的有效性和一般性。

Dilated convolutions are widely used in deep semantic segmentation models as they can enlarge the filters' receptive field without adding additional weights nor sacrificing spatial resolution. However, as dilated convolutional filters do not possess positional knowledge about the pixels on semantically meaningful contours, they could lead to ambiguous predictions on object boundaries. In addition, although dilating the filter can expand its receptive field, the total number of sampled pixels remains unchanged, which usually comprises a small fraction of the receptive field's total area. Inspired by the Lateral Inhibition (LI) mechanisms in human visual systems, we propose the dilated convolution with lateral inhibitions (LI-Convs) to overcome these limitations. Introducing LI mechanisms improves the convolutional filter's sensitivity to semantic object boundaries. Moreover, since LI-Convs also implicitly take the pixels from the laterally inhibited zones into consideration, they can also extract features at a denser scale. By integrating LI-Convs into the Deeplabv3+ architecture, we propose the Lateral Inhibited Atrous Spatial Pyramid Pooling (LI-ASPP), the Lateral Inhibited MobileNet-V2 (LI-MNV2) and the Lateral Inhibited ResNet (LI-ResNet). Experimental results on three benchmark datasets (PASCAL VOC 2012, CelebAMask-HQ and ADE20K) show that our LI-based segmentation models outperform the baseline on all of them, thus verify the effectiveness and generality of the proposed LI-Convs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题