突变体：视觉问题回答中分布概括的训练范式

论文标题

突变体：视觉问题回答中分布概括的训练范式

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

论文作者

Gokhale, Tejas, Banerjee, Pratyay, Baral, Chitta, Yang, Yezhou

论文摘要

尽管在回答排行榜的视觉问题上取得了进展，但模型经常利用I.I.D.下的数据集中的虚假相关性和先验。环境。因此，对分布（OOD）测试样本的评估已成为概括的代理。在本文中，我们提出了一种突变体，该突变体是一种训练范式，将模型暴露于感知相似但具有语义上不同的输入突变，以改善OOD概括，例如VQA-CP挑战。在此范式下，模型利用一个一致性受限的训练目标来了解输入（问题图像对）对输出（答案）的语义变化的影响。与现有的VQA-CP方法不同，突变体不依赖有关火车和测试答案分布的性质的知识。 Mutant在VQA-CP上建立了新的最先进的准确性，并提高了10.57美元。我们的工作开辟了使用语义输入突变来进行的概括的途径。

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57\%$ improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题