高分辨率视频框架插值的有效特征提取

论文标题

高分辨率视频框架插值的有效特征提取

Efficient Feature Extraction for High-resolution Video Frame Interpolation

论文作者

Nottebaum, Moritz, Roth, Stefan, Schaub-Meyer, Simone

论文摘要

视频框架插值的大多数深度学习方法由三个主要组成部分组成：特征提取，运动估计和图像合成。现有方法主要从这些模块的设计方式方面可以区分。但是，当插值高分辨率图像时，例如在4K时，在合理的内存要求中实现高精度的设计选择受到限制。特征提取层有助于压缩输入并提取后一个阶段的相关信息，例如运动估计。但是，这些层在参数，计算时间和内存上通常是昂贵的。我们展示了如何使用降低维度的想法与轻量优化相结合的想法来压缩输入表示，同时保持提取的信息适合框架插值。此外，我们既不需要预处理的流网络也不需要合成网络，还可以减少可训练的参数的数量和所需的内存。在评估三个4K基准测试时，我们在没有预审计的流量的方法中实现了最先进的图像质量，同时总体上具有最低的网络复杂性和内存需求。

Most deep learning methods for video frame interpolation consist of three main components: feature extraction, motion estimation, and image synthesis. Existing approaches are mainly distinguishable in terms of how these modules are designed. However, when interpolating high-resolution images, e.g. at 4K, the design choices for achieving high accuracy within reasonable memory requirements are limited. The feature extraction layers help to compress the input and extract relevant information for the latter stages, such as motion estimation. However, these layers are often costly in parameters, computation time, and memory. We show how ideas from dimensionality reduction combined with a lightweight optimization can be used to compress the input representation while keeping the extracted information suitable for frame interpolation. Further, we require neither a pretrained flow network nor a synthesis network, additionally reducing the number of trainable parameters and required memory. When evaluating on three 4K benchmarks, we achieve state-of-the-art image quality among the methods without pretrained flow while having the lowest network complexity and memory requirements overall.

下载PDF全文

下载文献需遵守相关版权规定

论文标题