SIMVP：更简单但更好的视频预测

论文标题

SIMVP：更简单但更好的视频预测

SimVP: Simpler yet Better Video Prediction

论文作者

Gao, Zhangyang, Tan, Cheng, Wu, Lirong, Li, Stan Z.

论文摘要

从CNN，RNN到VIT，我们目睹了视频预测的显着进步，结合了辅助输入，详尽的神经体系结构和复杂的培训策略。我们钦佩这些进步，但对必要性感到困惑：是否有一种可以表现出色的简单方法？本文提出了SIMVP，这是一个简单的视频预测模型，完全建立在CNN上，并以端到端的方式受到MSE损失的培训。在不引入任何其他技巧和复杂策略的情况下，我们可以在五个基准数据集上实现最先进的性能。通过扩展实验，我们证明了SIMVP在现实世界数据集上具有强大的概括和可扩展性。培训成本的显着降低使扩展到复杂方案变得更加容易。我们认为SIMVP可以作为刺激视频预测进一步发展的坚实基线。该代码可在\ href {https://github.com/gaozhangyang/simvp-simpler-yet-better-video-prediction} {github}中获得。

From CNN, RNN, to ViT, we have witnessed remarkable advancements in video prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated training strategies. We admire these progresses but are confused about the necessity: is there a simple method that can perform comparably well? This paper proposes SimVP, a simple video prediction model that is completely built upon CNN and trained by MSE loss in an end-to-end fashion. Without introducing any additional tricks and complicated strategies, we can achieve state-of-the-art performance on five benchmark datasets. Through extended experiments, we demonstrate that SimVP has strong generalization and extensibility on real-world datasets. The significant reduction of training cost makes it easier to scale to complex scenarios. We believe SimVP can serve as a solid baseline to stimulate the further development of video prediction. The code is available at \href{https://github.com/gaozhangyang/SimVP-Simpler-yet-Better-Video-Prediction}{Github}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题