视频检索的基于图的时间聚合

论文标题

视频检索的基于图的时间聚合

Graph Based Temporal Aggregation for Video Retrieval

论文作者

Srinivasan, Arvind, Bharadwaj, Aprameya, Saha, Aveek, Natarajan, Subramanyam

论文摘要

大型视频检索是一个研究领域，并进行了许多正在进行的研究。该领域中的大多数工作都是使用VSE ++等技术通过文本查询进行视频检索。但是，几乎没有对视频检索通过图像查询进行的研究，并且在该领域所做的工作要么在视频数据集内使用图像查询，要么通过视频框架迭代。这些方法并未从数据集外的查询中进行概括，并且对于大型视频数据集而言不能很好地扩展。为了克服这些问题，我们提出了一种通过图像查询进行视频检索的新方法，在该图像查询中，从所有视频中的一组帧集构建了一个无方向的图。该图的节点特征用于视频检索的任务。通过使用数据集外部的查询图像在MSR-VTT数据集上进行实验。为了评估这种新颖的方法P@5，P@10和P@20指标的计算。本研究中使用了两种不同的重新网络模型，即Resnet-152和Resnet-50。

Large scale video retrieval is a field of study with a lot of ongoing research. Most of the work in the field is on video retrieval through text queries using techniques such as VSE++. However, there is little research done on video retrieval through image queries, and the work that has been done in this field either uses image queries from within the video dataset or iterates through videos frame by frame. These approaches are not generalized for queries from outside the dataset and do not scale well for large video datasets. To overcome these issues, we propose a new approach for video retrieval through image queries where an undirected graph is constructed from the combined set of frames from all videos to be searched. The node features of this graph are used in the task of video retrieval. Experimentation is done on the MSR-VTT dataset by using query images from outside the dataset. To evaluate this novel approach P@5, P@10 and P@20 metrics are calculated. Two different ResNet models namely, ResNet-152 and ResNet-50 are used in this study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题