多VQG：为多个图像生成引人入胜的问题

论文标题

多VQG：为多个图像生成引人入胜的问题

Multi-VQG: Generating Engaging Questions for Multiple Images

论文作者

Yeh, Min-Hsuan, Chen, Vicent, Haung, Ting-Hao 'Kenneth', Ku, Lun-Wei

论文摘要

生成引人入胜的内容引起了NLP社区的最新关注。提出问题是回应照片并提高意识的自然方式。但是，传统问题（QA）数据集中问题的大多数答案都是FACTOITS，它降低了个人回答的意愿。此外，传统的视觉问题生成（VQG）将问题生成的源数据限制在单个图像中，从而有限地理解了基础事件的时间序列信息。在本文中，我们提出了从多个图像中产生引人入胜的问题。我们提出了一个新数据集MVQG，并建立了一系列基线，包括端到端和双阶段体系结构。结果表明，图像序列背后的建立故事使模型能够产生引人入胜的问题，这证实了我们的假设，即人们通常在提出问题之前在脑海中构建事件的图片。这些结果为视觉和语言模型打开了一个令人兴奋的挑战，即隐式构建一系列照片背后的故事，以允许创造力和经验共享，从而引起人们对下游应用程序的关注。

Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals' willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models to generate engaging questions, which confirms our assumption that people typically construct a picture of the event in their minds before asking questions. These results open up an exciting challenge for visual-and-language models to implicitly construct a story behind a series of photos to allow for creativity and experience sharing and hence draw attention to downstream applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题