通过指定答案分布的熵来重新提出视觉问题

论文标题

通过指定答案分布的熵来重新提出视觉问题

Rephrasing visual questions by specifying the entropy of the answer distribution

论文作者

Terao, Kento, Tamaki, Toru, Raytchev, Bisser, Kaneda, Kazufumi, Satoh, Shun'ichi

论文摘要

视觉问题回答（VQA）是回答视觉问题的任务，该问题是一对问题和图像。有些视觉问题是模棱两可的，有些是清楚的，并且可以将问题的歧义从情况转换为情况可能是适当的。但是，任何先前的工作尚未解决此问题。我们提出了一项新颖的任务，通过控制问题的歧义来重述问题。视觉问题的歧义是通过使用VQA模型预测的答案分布的熵来定义的。提出的模型将图像提供的一个来源问题重新表明，以便改写的问题具有用户指定的歧义（或熵）。我们提出了两种学习策略，以使用没有歧义信息的VQA V2数据集训练所提出的模型。我们证明了我们的方法的优势，可以控制改写问题的歧义，并且一个有趣的观察结果，即比降低歧义更难增加。

Visual question answering (VQA) is a task of answering a visual question that is a pair of question and image. Some visual questions are ambiguous and some are clear, and it may be appropriate to change the ambiguity of questions from situation to situation. However, this issue has not been addressed by any prior work. We propose a novel task, rephrasing the questions by controlling the ambiguity of the questions. The ambiguity of a visual question is defined by the use of the entropy of the answer distribution predicted by a VQA model. The proposed model rephrases a source question given with an image so that the rephrased question has the ambiguity (or entropy) specified by users. We propose two learning strategies to train the proposed model with the VQA v2 dataset, which has no ambiguity information. We demonstrate the advantage of our approach that can control the ambiguity of the rephrased questions, and an interesting observation that it is harder to increase than to reduce ambiguity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题