通过自我注意和对比功能，上下文感知的群体字幕

论文标题

通过自我注意和对比功能，上下文感知的群体字幕

Context-Aware Group Captioning via Self-Attention and Contrastive Features

论文作者

Li, Zhuowan, Tran, Quan, Mai, Long, Lin, Zhe, Yuille, Alan

论文摘要

尽管图像字幕迅速发展，但现有作品主要集中于描述单个图像。在本文中，我们介绍了一项新任务，上下文感知的组字幕，旨在在另一组相关参考图像的背景下描述一组目标图像。上下文感知的组字幕不仅需要总结目标和参考图像组的信息，而且还需要对其进行对比。为了解决这个问题，我们提出了一个结合自我发挥机制和对比特征构建的框架，以有效地总结每个图像组的共同信息，同时捕获它们之间的歧视性信息。为了为此任务构建数据集，我们建议使用场景图匹配的单个图像标题对图像进行分组并基于单个图像标题生成组字幕。我们的数据集是在公共概念字幕数据集和我们的新库存字幕数据集的顶部构建的。两个数据集的实验显示了我们方法对这项新任务的有效性。相关数据集和代码在https://lizw14.github.io/project/groupcap上发布。

While image captioning has progressed rapidly, existing works focus mainly on describing single images. In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images. Context-aware group captioning requires not only summarizing information from both the target and reference image group but also contrasting between them. To solve this problem, we propose a framework combining self-attention mechanism with contrastive feature construction to effectively summarize common information from each image group while capturing discriminative information between them. To build the dataset for this task, we propose to group the images and generate the group captions based on single image captions using scene graphs matching. Our datasets are constructed on top of the public Conceptual Captions dataset and our new Stock Captions dataset. Experiments on the two datasets show the effectiveness of our method on this new task. Related Datasets and code are released at https://lizw14.github.io/project/groupcap .

下载PDF全文

下载文献需遵守相关版权规定

论文标题