通过丰富而公平的语义提取的公正场景图生成

论文标题

通过丰富而公平的语义提取的公正场景图生成

Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction

论文作者

Wen, Bin, Luo, Jie, Liu, Xianglong, Huang, Lei

论文摘要

在图像中提取视觉场景的图表表示是计算机视觉中的一项具有挑战性的任务。尽管在过去十年中，场景图的产生一直令人鼓舞，但我们出乎意料地发现，现有方法的性能在很大程度上受到强烈的偏见的限制，这主要源于（1）在不知不觉中与某些语义属性（例如对称性和（2）不同关系中的不平衡注释）的关系。为了减轻这些偏见的负面影响，我们提出了一种名为Rich and Fable Sminantic提取网络（RIFA简称）的新的简单架构，不仅捕获了关系的丰富语义属性，而且还可以通过不同规模的注释进行相当预测的关系。 RIFA使用伪塞亚姆网络分别嵌入主题和对象，以区分其语义差异，同时保留其潜在的语义属性。然后，它进一步预测了基于某些上下文领域实体的视觉和语义特征的主题 - 对象关系，并公平地对具有少数注释的人的关系预测进行了排名。流行的视觉基因组数据集的实验表明，RIFA在场景图任务的几个具有挑战性的设置下实现了最新的性能。尤其是，它在捕获关系的不同语义属性方面的表现要好得多，并获得了每一个关系表现最佳的总体总体。

Extracting graph representation of visual scenes in image is a challenging task in computer vision. Although there has been encouraging progress of scene graph generation in the past decade, we surprisingly find that the performance of existing approaches is largely limited by the strong biases, which mainly stem from (1) unconsciously assuming relations with certain semantic properties such as symmetric and (2) imbalanced annotations over different relations. To alleviate the negative effects of these biases, we proposed a new and simple architecture named Rich and Fair semantic extraction network (RiFa for short), to not only capture rich semantic properties of the relations, but also fairly predict relations with different scale of annotations. Using pseudo-siamese networks, RiFa embeds the subject and object respectively to distinguish their semantic differences and meanwhile preserve their underlying semantic properties. Then, it further predicts subject-object relations based on both the visual and semantic features of entities under certain contextual area, and fairly ranks the relation predictions for those with a few annotations. Experiments on the popular Visual Genome dataset show that RiFa achieves state-of-the-art performance under several challenging settings of scene graph task. Especially, it performs significantly better on capturing different semantic properties of relations, and obtains the best overall per relation performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题