对比损失是无监督视频摘要的自然标准

论文标题

对比损失是无监督视频摘要的自然标准

Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

论文作者

Pang, Zongshang, Nakashima, Yuta, Otani, Mayu, Nagahara, Hajime

论文摘要

视频摘要旨在选择视频中最有用的帧子集，以促进有效的视频浏览。无监督的方法通常依赖于启发式培训目标，例如多样性和代表性。但是，这种方法需要引导在线生成的摘要来计算重要性得分回归的目标。我们认为这样的管道效率低下，并试图在表示文献中的对比损失的帮助下直接量化框架级别的重要性。利用对比度损失，我们提出了三个具有理想关键框架的指标：局部差异，全球一致性和独特性。通过在图像分类任务中进行的特征，指标已经可以产生高质量的重要性得分，比过去训练有素的方法表现出竞争性或更好的性能。我们表明，通过使用轻量级学习的投影模块完善预训练的功能，可以进一步提高框架级的重要性得分，并且该模型还可以利用大量随机视频并推广以表现出色的性能测试视频。可在https://github.com/pangzss/pytorch-ctvsum上获得代码。

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Unsupervised methods usually rely on heuristic training objectives such as diversity and representativeness. However, such methods need to bootstrap the online-generated summaries to compute the objectives for importance score regression. We consider such a pipeline inefficient and seek to directly quantify the frame-level importance with the help of contrastive losses in the representation learning literature. Leveraging the contrastive losses, we propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness. With features pre-trained on the image classification task, the metrics can already yield high-quality importance scores, demonstrating competitive or better performance than past heavily-trained methods. We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved, and the model can also leverage a large number of random videos and generalize to test videos with decent performance. Code available at https://github.com/pangzss/pytorch-CTVSUM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题