无偏见的场景图生成有偏见的训练

论文标题

无偏见的场景图生成有偏见的训练

Unbiased Scene Graph Generation from Biased Training

论文作者

Tang, Kaihua, Niu, Yulei, Huang, Jianqiang, Shi, Jiaxin, Zhang, Hanwang

论文摘要

当今的场景图（SGG）任务仍然远非实用，这主要是由于严重的训练偏见，例如，将各种各样的“人类步行 /坐在 /坐在海滩上 /躺在海滩上”倒塌了。给定这样的SGG，诸如VQA之类的下游任务几乎不能比仅一袋对象来推断更好的场景结构。但是，SGG中的辩论并不是很微不足道的，因为传统的辩护方法无法区分好和坏偏见，例如，良好的上下文先验（例如，“人读书”而不是“饮食”）和坏长尾巴偏见（例如，“近乎“统治”）。在本文中，我们提出了一个基于因果推断的新型SGG框架，但不是常规的可能性。我们首先为SGG构建因果图，并使用该图进行传统的偏见训练。然后，我们建议从训练的图中绘制反事实的因果关系，以从不良偏见中推断出效果，应将其删除。特别是，我们使用总直接效应（TDE）作为无偏SGG的最终谓词得分。请注意，我们的框架对任何SGG模型都不可知，因此可以广泛应用于寻求无偏见的社区。通过在SGG基准视觉基因组和几个流行模型上使用所提出的场景图诊断工具包，我们观察到了对先前最新方法的显着改善。

Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e.g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach". Given such SGG, the down-stream tasks such as VQA can hardly infer better scene structures than merely a bag of objects. However, debiasing in SGG is not trivial because traditional debiasing methods cannot distinguish between the good and bad bias, e.g., good context prior (e.g., "person read book" rather than "eat") and bad long-tailed bias (e.g., "near" dominating "behind / in front of"). In this paper, we present a novel SGG framework based on causal inference but not the conventional likelihood. We first build a causal graph for SGG, and perform traditional biased training with the graph. Then, we propose to draw the counterfactual causality from the trained graph to infer the effect from the bad bias, which should be removed. In particular, we use Total Direct Effect (TDE) as the proposed final predicate score for unbiased SGG. Note that our framework is agnostic to any SGG model and thus can be widely applied in the community who seeks unbiased predictions. By using the proposed Scene Graph Diagnosis toolkit on the SGG benchmark Visual Genome and several prevailing models, we observed significant improvements over the previous state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题