论文标题
vid-ode:带有神经普通微分方程的连续时间视频生成
Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation
论文作者
论文摘要
视频生成模型通常在固定帧速率的假设下运行,这在处理柔性帧速率时会导致次优(例如,提高视频中更动态的部分的帧速率以及处理缺失的视频框架)。为了解决现有视频生成模型处理任意时间段的能力的受限性质,我们建议通过将神经颂歌(VID-ODE)与像素级视频处理技术相结合,提出连续时间的视频生成。使用Ode-Convgru作为编码器,这是最近提出的神经ODE的卷积版本,这使我们能够学习连续的时间动力学,VID-ODE可以学习灵活帧速率的输入视频的时空动力学。解码器将学习的动力学功能集成在一起,以在任何给定的时间段上合成视频帧,其中使用像素级组成技术来维持单个帧的清晰度。通过在四个现实世界视频数据集上进行广泛的实验,我们验证了所提出的VID-ode在各种视频生成设置下的最先进方法,无论是在训练有素的时间范围内(插值)和超越范围(外推)。据我们所知,Vid-Ode是第一批成功地使用现实世界视频进行连续时间的视频生成的工作。
Video generation models often operate under the assumption of fixed frame rates, which leads to suboptimal performance when it comes to handling flexible frame rates (e.g., increasing the frame rate of the more dynamic portion of the video as well as handling missing video frames). To resolve the restricted nature of existing video generation models' ability to handle arbitrary timesteps, we propose continuous-time video generation by combining neural ODE (Vid-ODE) with pixel-level video processing techniques. Using ODE-ConvGRU as an encoder, a convolutional version of the recently proposed neural ODE, which enables us to learn continuous-time dynamics, Vid-ODE can learn the spatio-temporal dynamics of input videos of flexible frame rates. The decoder integrates the learned dynamics function to synthesize video frames at any given timesteps, where the pixel-level composition technique is used to maintain the sharpness of individual frames. With extensive experiments on four real-world video datasets, we verify that the proposed Vid-ODE outperforms state-of-the-art approaches under various video generation settings, both within the trained time range (interpolation) and beyond the range (extrapolation). To the best of our knowledge, Vid-ODE is the first work successfully performing continuous-time video generation using real-world videos.