论文标题
视觉概念推理网络
Visual Concept Reasoning Networks
论文作者
论文摘要
分裂转换策略已广泛用作视觉识别任务的卷积神经网络中的架构约束。它通过明确定义多个分支以同时学习具有不同的视觉概念或属性的表示形式来近似稀疏连接的网络。这些表示形式之间的依赖性或相互作用通常由密集和局部操作定义,但是,没有任何适应性或高级推理。在这项工作中,我们建议利用这种策略,并将其与我们的视觉概念推理网络(VCRNET)相结合,以在高级视觉概念之间进行推理。我们将每个分支与视觉概念相关联,并通过通过注意模块选择一些局部描述符来得出紧凑的概念状态。然后,通过基于图的交互对这些概念状态进行更新,并用于自适应调节本地描述符。我们通过拆分转变 - 触及 - 互联交换阶段来描述我们所提出的模型,这些阶段是通过选择高度模块化的体系结构来实现的。关于视觉识别任务的广泛实验,例如图像分类,语义分割,对象检测,场景识别和动作识别表明,我们提出的模型VCRNET始终通过将参数数量提高不到1%来提高性能。
A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks. It approximates sparsely connected networks by explicitly defining multiple branches to simultaneously learn representations with different visual concepts or properties. Dependencies or interactions between these representations are typically defined by dense and local operations, however, without any adaptiveness or high-level reasoning. In this work, we propose to exploit this strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to enable reasoning between high-level visual concepts. We associate each branch with a visual concept and derive a compact concept state by selecting a few local descriptors through an attention module. These concept states are then updated by graph-based interaction and used to adaptively modulate the local descriptors. We describe our proposed model by split-transform-attend-interact-modulate-merge stages, which are implemented by opting for a highly modularized architecture. Extensive experiments on visual recognition tasks such as image classification, semantic segmentation, object detection, scene recognition, and action recognition show that our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.