使用场景和知识图检测符号图像检测

论文标题

使用场景和知识图检测符号图像检测

Symbolic image detection using scene and knowledge graphs

论文作者

Kalanat, Nasrin, Kovashka, Adriana

论文摘要

有时，图像传达的含义超出了它们所包含的对象列表。相反，图像可能会表达有力的信息，以影响观众的思想。推断此消息需要关于对象之间关系的推理以及有关组件的一般常识知识。在本文中，我们使用场景图，图像的图表来捕获视觉组件。此外，我们使用从概念网络提取的事实来生成知识图，以了解对象和属性。为了检测符号，我们提出了一个名为SKG-SYM的神经网络框架。该框架首先使用图形卷积网络生成图像场景图及其知识图的表示形式。然后，该框架融合了表示形式，并使用MLP对其进行分类。我们进一步扩展网络以使用注意力机制，该机制了解图表的重要性。我们在广告数据集上评估了我们的方法，并将其与基线象征主义分类方法（RESNET和VGG）进行比较。结果表明，我们的方法在F评分方面优于重新连接，并且基于注意力的机制与VGG具有竞争力，而模型复杂性较低。

Sometimes the meaning conveyed by images goes beyond the list of objects they contain; instead, images may express a powerful message to affect the viewers' minds. Inferring this message requires reasoning about the relationships between the objects, and general common-sense knowledge about the components. In this paper, we use a scene graph, a graph representation of an image, to capture visual components. In addition, we generate a knowledge graph using facts extracted from ConceptNet to reason about objects and attributes. To detect the symbols, we propose a neural network framework named SKG-Sym. The framework first generates the representations of the scene graph of the image and its knowledge graph using Graph Convolution Network. The framework then fuses the representations and uses an MLP to classify them. We extend the network further to use an attention mechanism which learn the importance of the graph representations. We evaluate our methods on a dataset of advertisements, and compare it with baseline symbolism classification methods (ResNet and VGG). Results show that our methods outperform ResNet in terms of F-score and the attention-based mechanism is competitive with VGG while it has much lower model complexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题