通过使用多任务学习范式，板凳标记和改进阿拉伯语自动图像字幕

论文标题

通过使用多任务学习范式，板凳标记和改进阿拉伯语自动图像字幕

Bench-Marking And Improving Arabic Automatic Image Captioning Through The Use Of Multi-Task Learning Paradigm

论文作者

Za'ter, Muhy Eddin, Talafha, Bashar

论文摘要

社交媒体的使用和互联网上的视觉内容的不断增加已经加速了计算机视觉领域的研究以及特定的图像字幕字幕任务。生成最能描述图像的标题的过程是可以在图像索引中使用的各种应用程序的有用任务，也可以用作视力障碍的助听器。近年来，图像字幕任务在数据集和架构上都取得了显着的进步，因此，字幕质量达到了惊人的性能。但是，这些进步中的大多数尤其是在数据集中的目标是针对英语，这将其他语言（例如阿拉伯语落后）留下了。尽管阿拉伯语是由超过4.5亿人使用的，并且是互联网上不断发展的语言，但缺乏推进其图像标题研究所需的基本支柱，例如基准或统一的数据集。这项工作是通过提供统一的数据集和基准来加快此任务中的协同作用的尝试，同时还探索可以增强阿拉伯图像字幕性能的方法和技术。探索了多任务学习的使用，以及探索各种单词表示和不同的功能。结果表明，使用多任务学习和预训练的单词嵌入明显提高了图像字幕的质量，但是提出的结果表明，与英语相比，阿拉伯语字幕仍然落后于落后。使用的数据集和代码在此链接中可用。

The continuous increase in the use of social media and the visual content on the internet have accelerated the research in computer vision field in general and the image captioning task in specific. The process of generating a caption that best describes an image is a useful task for various applications such as it can be used in image indexing and as a hearing aid for the visually impaired. In recent years, the image captioning task has witnessed remarkable advances regarding both datasets and architectures, and as a result, the captioning quality has reached an astounding performance. However, the majority of these advances especially in datasets are targeted for English, which left other languages such as Arabic lagging behind. Although Arabic language, being spoken by more than 450 million people and being the most growing language on the internet, lacks the fundamental pillars it needs to advance its image captioning research, such as benchmarks or unified datasets. This works is an attempt to expedite the synergy in this task by providing unified datasets and benchmarks, while also exploring methods and techniques that could enhance the performance of Arabic image captioning. The use of multi-task learning is explored, alongside exploring various word representations and different features. The results showed that the use of multi-task learning and pre-trained word embeddings noticeably enhanced the quality of image captioning, however the presented results shows that Arabic captioning still lags behind when compared to the English language. The used dataset and code are available at this link.

下载PDF全文

下载文献需遵守相关版权规定

论文标题