论文标题
VD-PCR:用代词核心分辨率改进视觉对话框
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
论文作者
论文摘要
视觉对话框任务要求AI代理在基于视觉环境的多轮对话框中与人类互动。作为一种常见的语言现象,代词通常用于对话中以提高沟通效率。结果,解决代词(即,将代词接地代词与他们所指的名词短语)是理解对话框的重要步骤。在本文中,我们提出了VD-PCR,这是一个新颖的框架,旨在以隐式和明确的方式以代词核心分辨率来提高视觉对话的理解。首先,为了隐含模型理解代词,我们设计了新的方法来执行代词核心分辨率和视觉对话框任务的联合培训。其次,在观察到代词及其指南的核心关系表示对话框之间的相关性之后,我们建议在Visual Dialog模型的输入中明确修剪无关的历史记录。通过修剪的输入,这些模型可以专注于相关的对话历史记录,并忽略无关紧要的分心。通过提出的隐式和显式方法,VD-PCR在Visdial数据集上实现了最新的实验结果。
The visual dialog task requires an AI agent to interact with humans in multi-round dialogs based on a visual environment. As a common linguistic phenomenon, pronouns are often used in dialogs to improve the communication efficiency. As a result, resolving pronouns (i.e., grounding pronouns to the noun phrases they refer to) is an essential step towards understanding dialogs. In this paper, we propose VD-PCR, a novel framework to improve Visual Dialog understanding with Pronoun Coreference Resolution in both implicit and explicit ways. First, to implicitly help models understand pronouns, we design novel methods to perform the joint training of the pronoun coreference resolution and visual dialog tasks. Second, after observing that the coreference relationship of pronouns and their referents indicates the relevance between dialog rounds, we propose to explicitly prune the irrelevant history rounds in visual dialog models' input. With pruned input, the models can focus on relevant dialog history and ignore the distraction in the irrelevant one. With the proposed implicit and explicit methods, VD-PCR achieves state-of-the-art experimental results on the VisDial dataset.