迈向跨语性图像字幕的无注释评估

论文标题

迈向跨语性图像字幕的无注释评估

Towards Annotation-Free Evaluation of Cross-Lingual Image Captioning

论文作者

Chen, Aozhu, Huang, Xinyi, Lin, Hailan, Li, Xirong

论文摘要

跨语性图像字幕及其在英语以外的目标语言中标题的标签图像的能力是多媒体领域的一个新兴主题。为了使宝贵的人力资源免于每个目标语言的重写参考句子，我们勇敢地尝试无注释的跨语性图像字幕评估。根据我们是否假定英语参考文献的可用性，研究了两种情况。对于第一种情况，我们提出了两个指标，即WMDREL和CLINREL。 WMDREL使用其单词Mover的距离来衡量模型生成的标题与英文参考的机器翻译之间的语义相关性。通过将两个字幕投影到一个深度的视觉特征空间中，Clinrel是一种面向视觉的跨语言相关性措施。至于第二种情况（零参考），因此更具挑战性，我们建议CMEDREL在与Clinrel使用的相同的视觉特征空间中计算生成的字幕和图像内容之间的跨媒体相关性。有希望的结果表明，新指标对评估的潜力很高，而无需目标语言中的参考。

Cross-lingual image captioning, with its ability to caption an unlabeled image in a target language other than English, is an emerging topic in the multimedia field. In order to save the precious human resource from re-writing reference sentences per target language, in this paper we make a brave attempt towards annotation-free evaluation of cross-lingual image captioning. Depending on whether we assume the availability of English references, two scenarios are investigated. For the first scenario with the references available, we propose two metrics, i.e., WMDRel and CLinRel. WMDRel measures the semantic relevance between a model-generated caption and machine translation of an English reference using their Word Mover's Distance. By projecting both captions into a deep visual feature space, CLinRel is a visual-oriented cross-lingual relevance measure. As for the second scenario, which has zero reference and is thus more challenging, we propose CMedRel to compute a cross-media relevance between the generated caption and the image content, in the same visual feature space as used by CLinRel. The promising results show high potential of the new metrics for evaluation with no need of references in the target language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题