MHSCNET：视频摘要的多模式分层镜头感知网络

论文标题

MHSCNET：视频摘要的多模式分层镜头感知网络

MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video Summarization

论文作者

Xu, Wujiang, Wang, Runzhong, Guo, Xiaobo, Li, Shaoshuai, Ma, Qiongxu, Zhao, Yunan, Guo, Sheng, Zhu, Zhenfeng, Yan, Junchi

论文摘要

视频摘要旨在通过有效捕获和组合整个内容中最有用的部分来产生简洁的视频摘要。视频摘要的现有方法将任务视为框架的关键帧选择问题，并通常通过将远程时间依赖性与单峰或双峰信息相结合来构建框架的表示形式。但是，最佳视频摘要需要用自己的信息来反映最有价值的密钥帧，并且具有整体内容的语义力量。因此，至关重要的是要构建更强大，更强大的框架表示形式，并以公平和全面的方式预测框架级别的重要性得分。为了解决上述问题，我们提出了一个多模式层次射击感知的卷积网络，称为MHSCNET，以通过结合全面的可用多模式信息来增强框架的表示。具体而言，我们设计了一个分层shotConv网络，以考虑使用短距离和远程时间依赖性来结合自适应射击框架级别的表示。基于学到的射击表示表示，MHSCNET可以预测视频本地和全球视图中的帧级重要得分。两个标准视频摘要数据集的广泛实验表明，我们提出的方法始终优于最先进的基线。源代码将公开可用。

Video summarization intends to produce a concise video summary by effectively capturing and combining the most informative parts of the whole content. Existing approaches for video summarization regard the task as a frame-wise keyframe selection problem and generally construct the frame-wise representation by combining the long-range temporal dependency with the unimodal or bimodal information. However, the optimal video summaries need to reflect the most valuable keyframe with its own information, and one with semantic power of the whole content. Thus, it is critical to construct a more powerful and robust frame-wise representation and predict the frame-level importance score in a fair and comprehensive manner. To tackle the above issues, we propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation via combining the comprehensive available multimodal information. Specifically, we design a hierarchical ShotConv network to incorporate the adaptive shot-aware frame-level representation by considering the short-range and long-range temporal dependency. Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video. Extensive experiments on two standard video summarization datasets demonstrate that our proposed method consistently outperforms state-of-the-art baselines. Source code will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题