DEVLBERT：学习不融合的visio语言表示

论文标题

DEVLBERT：学习不融合的visio语言表示

DeVLBert: Learning Deconfounded Visio-Linguistic Representations

论文作者

Zhang, Shengyu, Jiang, Tan, Wang, Tan, Kuang, Kun, Zhao, Zhou, Zhu, Jianke, Yu, Jin, Yang, Hongxia, Wu, Fei

论文摘要

在本文中，我们建议研究室外粘性语言审慎的问题，在这些问题上，预处理的数据分布与审计模型将被微调的下游数据不同。该问题的现有方法纯粹基于可能性，导致伪造的相关性并损害了转移到下游任务时的概括能力。通过虚假的相关性，我们的意思是，一个令牌（对象或单词）的条件概率可以很高（由于数据集偏见），而它们之间没有牢固的（因果关系）之间的关系。为了减轻此类数据集偏见，我们提出了一个被剥夺的visio语言的Bert框架，缩写为Devlbert，以进行基于干预的学习。我们借用因果关系研究领域的后门调整的想法，并提出了几种基于神经网络的建筑，以实现BERT式的室外预处理。对三个下游任务的定量结果，图像检索（IR），零射击IR和视觉问题回答，通过提高概括能力来显示Devlbert的有效性。

In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where the pretraining data distribution differs from that of downstream data on which the pretrained model will be fine-tuned. Existing methods for this problem are purely likelihood-based, leading to the spurious correlations and hurt the generalization ability when transferred to out-of-domain downstream tasks. By spurious correlation, we mean that the conditional probability of one token (object or word) given another one can be high (due to the dataset biases) without robust (causal) relationships between them. To mitigate such dataset biases, we propose a Deconfounded Visio-Linguistic Bert framework, abbreviated as DeVLBert, to perform intervention-based learning. We borrow the idea of the backdoor adjustment from the research field of causality and propose several neural-network based architectures for Bert-style out-of-domain pretraining. The quantitative results on three downstream tasks, Image Retrieval (IR), Zero-shot IR, and Visual Question Answering, show the effectiveness of DeVLBert by boosting generalization ability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题