缩放慢速MO：快速准确的一阶段时空视频超级分辨率

论文标题

缩放慢速MO：快速准确的一阶段时空视频超级分辨率

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

论文作者

Xiang, Xiaoyu, Tian, Yapeng, Zhang, Yulun, Fu, Yun, Allebach, Jan P., Xu, Chenliang

论文摘要

在本文中，我们探讨了时空视频超分辨率任务，该任务旨在从低帧速率（LFR），低分辨率（LR）视频中生成高分辨率（HR）慢动作视频。一个简单的解决方案是将其分为两个子任务：视频框架插值（VFI）和视频超分辨率（VSR）。但是，时间插值和空间超分辨率在此任务中与内部相关。两阶段方法无法完全利用自然财产。此外，最先进的VFI或VSR网络需要大型帧合成或重建模块来预测高质量的视频帧，这使得两阶段的方法具有较大的模型大小，因此很耗时。为了克服问题，我们提出了一个单阶段的时空视频超分辨率框架，该框架直接从LFR，LR Video综合了HR慢动作视频。我们没有像VFI网络那样综合缺少的LR视频帧，而是首先在缺少LR视频框架中插入时间插值LR框架功能，这些框架捕获了建议的特征时间插值网络，从而捕获本地时间上下文。然后，我们提出了一个可变形的弯曲，以同时对齐和汇总时间信息，以更好地利用全球时间上下文。最后，采用了深层重建网络来预测HR慢动作视频帧。基准数据集上的广泛实验表明，所提出的方法不仅可以实现更好的定量和定性性能，而且比最近两阶段的最先进方法快三倍以上，例如Dain+EDVR和Dain+RBPN。

In this paper, we explore the space-time video super-resolution task, which aims to generate a high-resolution (HR) slow-motion video from a low frame rate (LFR), low-resolution (LR) video. A simple solution is to split it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). However, temporal interpolation and spatial super-resolution are intra-related in this task. Two-stage methods cannot fully take advantage of the natural property. In addition, state-of-the-art VFI or VSR networks require a large frame-synthesis or reconstruction module for predicting high-quality video frames, which makes the two-stage methods have large model sizes and thus be time-consuming. To overcome the problems, we propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video. Rather than synthesizing missing LR video frames as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network. Then, we propose a deformable ConvLSTM to align and aggregate temporal information simultaneously for better leveraging global temporal contexts. Finally, a deep reconstruction network is adopted to predict HR slow-motion video frames. Extensive experiments on benchmark datasets demonstrate that the proposed method not only achieves better quantitative and qualitative performance but also is more than three times faster than recent two-stage state-of-the-art methods, e.g., DAIN+EDVR and DAIN+RBPN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题