编码不看到的：可扩展中途检索的预测视频哈希

论文标题

编码不看到的：可扩展中途检索的预测视频哈希

Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval

论文作者

Yu, Tong, Padoy, Nicolas

论文摘要

本文解决了计算机视觉中的一个新问题：中途视频到视频检索。此任务包括搜索数据库中的内容类似于播放的视频权，例如从现场流中，展示具有挑战性的特征。视频的起点只能使用，因为查询和新框架随着视频的播放而不断添加。为了在这种苛刻的情况下进行检索，我们提出了一种基于二进制编码器的方法，该方法既具有预测性又是增量，以便（1）在查询时间说明缺少的视频内容，以及（2）在整个流中的重复，连续发展的查询。特别是，我们介绍了第一个哈希框架，该框架介绍了当前播放视频的未来内容。 FCVID和ActivityNet的实验证明了此任务的可行性。与此任务的文献相比，我们的方法还会产生显着的MAP@20性能提高，例如，使用192位尺寸的比特模式，在FCVID上的20％（50％）在FCVID上增加了7.4％（2.6％）。

This paper tackles a new problem in computer vision: mid-stream video-to-video retrieval. This task, which consists in searching a database for content similar to a video right as it is playing, e.g. from a live stream, exhibits challenging characteristics. Only the beginning part of the video is available as query and new frames are constantly added as the video plays out. To perform retrieval in this demanding situation, we propose an approach based on a binary encoder that is both predictive and incremental in order to (1) account for the missing video content at query time and (2) keep up with repeated, continuously evolving queries throughout the streaming. In particular, we present the first hashing framework that infers the unseen future content of a currently playing video. Experiments on FCVID and ActivityNet demonstrate the feasibility of this task. Our approach also yields a significant mAP@20 performance increase compared to a baseline adapted from the literature for this task, for instance 7.4% (2.6%) increase at 20% (50%) of elapsed runtime on FCVID using bitcodes of size 192 bits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题