视频问题回答任务的数据增强技术

论文标题

视频问题回答任务的数据增强技术

Data augmentation techniques for the Video Question Answering task

论文作者

Falcon, Alex, Lanz, Oswald, Serra, Giuseppe

论文摘要

视频问题回答（videoqa）是一项任务，需要一个模型来分析和理解输入视频给出的视觉内容和问题给出的文本部分，以及它们之间的相互作用，以产生有意义的答案。在我们的工作中，我们专注于利用第一人称视频的Egentric VideoQA任务，因为这种任务的重要性可能会影响许多不同领域的领域，例如那些与社会援助和工业培训有关的领域。最近，一个名为Egovqa的Egentric VideoQA数据集已发布。鉴于其尺寸很小，型号倾向于迅速过高。为了减轻这个问题，我们提出了几种增强技术，使我们对所考虑的基线的最终准确性提高了5.5％。

Video Question Answering (VideoQA) is a task that requires a model to analyze and understand both the visual content given by the input video and the textual part given by the question, and the interaction between them in order to produce a meaningful answer. In our work we focus on the Egocentric VideoQA task, which exploits first-person videos, because of the importance of such task which can have impact on many different fields, such as those pertaining the social assistance and the industrial training. Recently, an Egocentric VideoQA dataset, called EgoVQA, has been released. Given its small size, models tend to overfit quickly. To alleviate this problem, we propose several augmentation techniques which give us a +5.5% improvement on the final accuracy over the considered baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题