论文标题
CAPWAP:有目的的字幕
CapWAP: Captioning with a Purpose
论文作者
论文摘要
传统的图像字幕任务使用通用参考标题来提供有关图像的文本信息。但是,不同的用户群体将关心图像的不同视觉方面。在本文中,我们提出了一项新任务,并用目的字幕(CAPWAP)。我们的目标是开发可以针对预期人群的信息需求有用的系统,而不仅仅是提供有关图像的通用信息。在此任务中,我们使用询问答案(QA)对 - - 从用户而不是参考字幕的信息需求的自然表达来进行培训和推断后评估。我们表明,可以通过奖励允许问答模型为采样用户问题提供正确答案的输出来使用增强学习直接优化预期信息需求。我们将几个视觉问题转换为CAPWAP数据集中,并证明在各种情况下,我们有目的的字幕系统学会了比单独使用字幕单独使用字幕作为上下文来预测和满足特定信息的需求。
The traditional image captioning task uses generic reference captions to provide textual information about images. Different user populations, however, will care about different visual aspects of images. In this paper, we propose a new task, Captioning with a Purpose (CapWAP). Our goal is to develop systems that can be tailored to be useful for the information needs of an intended population, rather than merely provide generic information about an image. In this task, we use question-answer (QA) pairs---a natural expression of information need---from users, instead of reference captions, for both training and post-inference evaluation. We show that it is possible to use reinforcement learning to directly optimize for the intended information need, by rewarding outputs that allow a question answering model to provide correct answers to sampled user questions. We convert several visual question answering datasets into CapWAP datasets, and demonstrate that under a variety of scenarios our purposeful captioning system learns to anticipate and fulfill specific information needs better than its generic counterparts, as measured by QA performance on user questions from unseen images, when using the caption alone as context.