Ginet：场景解析的图形交互网络

论文标题

Ginet：场景解析的图形交互网络

GINet: Graph Interaction Network for Scene Parsing

论文作者

Wu, Tianyi, Lu, Yu, Zhu, Yu, Zhang, Chuang, Wu, Ming, Ma, Zhanyu, Guo, Guodong

论文摘要

最近，使用局部卷积以外的图像区域的上下文推理显示了场景解析的巨大潜力。在这项工作中，我们探讨了如何通过提出图形相互作用单元（GI单元）和语义上下文损失（SC-loss）来纳入语言知识以促进图像区域的上下文推理。 GI单元能够在高级语义上增强卷积网络的特征表示，并适应每个样本的语义相干性。具体而言，基于数据集的语言知识首先被合并到GI单元中，以通过视觉图来促进上下文推理，然后将视觉图的进化表示形式映射到每个本地表示，以增强场景解析的区分能力。 SC损坏进一步改善了GI单元，以增强基于示例的语义图上的语义表示。我们进行完整的消融研究，以证明每个组件在我们的方法中的有效性。尤其是，拟议的Ginet在流行的基准测试中的最先进方法（包括Pascal-Contept和Coco东西）都优于最先进的方法。

Recently, context reasoning using image regions beyond local convolution has shown great potential for scene parsing. In this work, we explore how to incorporate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss). The GI unit is capable of enhancing feature representations of convolution networks over high-level semantics and learning the semantic coherency adaptively to each sample. Specifically, the dataset-based linguistic knowledge is first incorporated in the GI unit to promote context reasoning over the visual graph, then the evolved representations of the visual graph are mapped to each local representation to enhance the discriminated capability for scene parsing. GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph. We perform full ablation studies to demonstrate the effectiveness of each component in our approach. Particularly, the proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.

下载PDF全文

下载文献需遵守相关版权规定

论文标题