论文标题

突变体:视觉问题回答中分布概括的训练范式

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

论文作者

Gokhale, Tejas, Banerjee, Pratyay, Baral, Chitta, Yang, Yezhou

论文摘要

尽管在回答排行榜的视觉问题上取得了进展,但模型经常利用I.I.D.下的数据集中的虚假相关性和先验。环境。因此,对分布(OOD)测试样本的评估已成为概括的代理。在本文中,我们提出了一种突变体,该突变体是一种训练范式,将模型暴露于感知相似但具有语义上不同的输入突变,以改善OOD概括,例如VQA-CP挑战。在此范式下,模型利用一个一致性受限的训练目标来了解输入(问题图像对)对输出(答案)的语义变化的影响。与现有的VQA-CP方法不同,突变体不依赖有关火车和测试答案分布的性质的知识。 Mutant在VQA-CP上建立了新的最先进的准确性,并提高了10.57美元。我们的工作开辟了使用语义输入突变来进行的概括的途径。

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57\%$ improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源