论文标题

MSPred:具有分层复发网络的多个时空量表的视频预测

MSPred: Video Prediction at Multiple Spatio-Temporal Scales with Hierarchical Recurrent Networks

论文作者

Villar-Corrales, Angel, Karapetyan, Ani, Boltres, Andreas, Behnke, Sven

论文摘要

自主系统不仅需要了解其当前环境,而且还应该能够预测以过去状态为条件的未来动作,例如基于捕获的相机框架。但是,现有模型主要集中于预测短时间的未来视频帧,因此用于长期行动计划的使用有限。我们提出了多尺度分层预测(MSPRED),这是一个新型的视频预测模型,能够同时预测不同时空量表下不同水平粒度的未来可能的结果。通过结合空间和时间下采样,MSPred有效地预测了长期范围内人类姿势或位置等抽象表示,同时仍保持视频框架预测的竞争性能。在我们的实验中,我们证明了MSPred准确地预测了未来的视频帧以及在bin挑选和动作识别数据集上的高级表示(例如,关键点或语义),同时始终如一地超过了未来帧预测的流行方法。此外,我们在MSPred中消融了不同的模块和设计选择,在实验上验证了不同空间和时间粒度的功能会导致卓越的性能。可以在https://github.com/ais-bonn/mspred中找到复制我们实验的代码和模型。

Autonomous systems not only need to understand their current environment, but should also be able to predict future actions conditioned on past states, for instance based on captured camera frames. However, existing models mainly focus on forecasting future video frames for short time-horizons, hence being of limited use for long-term action planning. We propose Multi-Scale Hierarchical Prediction (MSPred), a novel video prediction model able to simultaneously forecast future possible outcomes of different levels of granularity at different spatio-temporal scales. By combining spatial and temporal downsampling, MSPred efficiently predicts abstract representations such as human poses or locations over long time horizons, while still maintaining a competitive performance for video frame prediction. In our experiments, we demonstrate that MSPred accurately predicts future video frames as well as high-level representations (e.g. keypoints or semantics) on bin-picking and action recognition datasets, while consistently outperforming popular approaches for future frame prediction. Furthermore, we ablate different modules and design choices in MSPred, experimentally validating that combining features of different spatial and temporal granularity leads to a superior performance. Code and models to reproduce our experiments can be found in https://github.com/AIS-Bonn/MSPred.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源