论文标题
电影:大型运动的框架插值
FILM: Frame Interpolation for Large Motion
论文作者
论文摘要
我们提出了一种框架插值算法,该算法从两个输入图像中综合了具有大型内部运动的两个输入图像。最近的方法使用多个网络来估计光流或深度以及专门用于框架合成的单独网络。这通常很复杂,需要稀缺的光流或深度地面真实。在这项工作中,我们提出了一个单一的统一网络,该网络由一个多尺度的特征提取器区别,该特征提取器在各个尺度上共享权重,并且可以单独从框架中进行训练。为了综合酥脆和令人愉悦的框架,我们建议使用革兰氏矩阵损失来优化我们的网络,以衡量特征图之间的相关差异。我们的方法在XIPH大型运动基准上的最先进方法优于最先进的方法。与使用感知损失的方法相比,我们还可以在Vimeo-90K,Middlebury和UCF101上获得更高的分数。我们研究了体重共享和培训的效果,该数据集的运动范围不断增加。最后,我们证明了模型在综合高质量和临时连贯的视频中的有效性,这是一个具有挑战性的近乎修复的照片数据集。代码和预训练的模型可在https://film-net.github.io上找到。
We present a frame interpolation algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion. Recent methods use multiple networks to estimate optical flow or depth and a separate network dedicated to frame synthesis. This is often complex and requires scarce optical flow or depth ground-truth. In this work, we present a single unified network, distinguished by a multi-scale feature extractor that shares weights at all scales, and is trainable from frames alone. To synthesize crisp and pleasing frames, we propose to optimize our network with the Gram matrix loss that measures the correlation difference between feature maps. Our approach outperforms state-of-the-art methods on the Xiph large motion benchmark. We also achieve higher scores on Vimeo-90K, Middlebury and UCF101, when comparing to methods that use perceptual losses. We study the effect of weight sharing and of training with datasets of increasing motion range. Finally, we demonstrate our model's effectiveness in synthesizing high quality and temporally coherent videos on a challenging near-duplicate photos dataset. Codes and pre-trained models are available at https://film-net.github.io.