论文标题
多VQG:为多个图像生成引人入胜的问题
Multi-VQG: Generating Engaging Questions for Multiple Images
论文作者
论文摘要
生成引人入胜的内容引起了NLP社区的最新关注。提出问题是回应照片并提高意识的自然方式。但是,传统问题(QA)数据集中问题的大多数答案都是FACTOITS,它降低了个人回答的意愿。此外,传统的视觉问题生成(VQG)将问题生成的源数据限制在单个图像中,从而有限地理解了基础事件的时间序列信息。在本文中,我们提出了从多个图像中产生引人入胜的问题。我们提出了一个新数据集MVQG,并建立了一系列基线,包括端到端和双阶段体系结构。结果表明,图像序列背后的建立故事使模型能够产生引人入胜的问题,这证实了我们的假设,即人们通常在提出问题之前在脑海中构建事件的图片。这些结果为视觉和语言模型打开了一个令人兴奋的挑战,即隐式构建一系列照片背后的故事,以允许创造力和经验共享,从而引起人们对下游应用程序的关注。
Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals' willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models to generate engaging questions, which confirms our assumption that people typically construct a picture of the event in their minds before asking questions. These results open up an exciting challenge for visual-and-language models to implicitly construct a story behind a series of photos to allow for creativity and experience sharing and hence draw attention to downstream applications.