基于信念修订的字幕重新置于视觉语义信息

论文标题

基于信念修订的字幕重新置于视觉语义信息

Belief Revision based Caption Re-ranker with Visual Semantic Information

论文作者

Sabir, Ahmed, Moreno-Noguer, Francesc, Madhyastha, Pranava, Padró, Lluís

论文摘要

在这项工作中，我们专注于改善图像捕获生成系统生成的字幕。我们提出了一种新型的重新排列方法，该方法利用视觉声音措施来确定最大程度地捕获图像中视觉信息的理想标题。我们的重新级别使用了信念修订框架（Blok等，2003），通过明确利用所描绘的字幕和视觉上下文之间的语义相关性来校准顶级字幕的原始可能性。我们的实验证明了我们的方法的实用性，我们观察到我们的重新级别可以提高典型的图像捕获系统的性能，而无需进行任何其他培训或微调。

In this work, we focus on improving the captions generated by image-caption generation systems. We propose a novel re-ranking approach that leverages visual-semantic measures to identify the ideal caption that maximally captures the visual information in the image. Our re-ranker utilizes the Belief Revision framework (Blok et al., 2003) to calibrate the original likelihood of the top-n captions by explicitly exploiting the semantic relatedness between the depicted caption and the visual context. Our experiments demonstrate the utility of our approach, where we observe that our re-ranker can enhance the performance of a typical image-captioning system without the necessity of any additional training or fine-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题