论文标题
可玩环境:时空操纵
Playable Environments: Video Manipulation in Space and Time
论文作者
论文摘要
我们介绍可玩环境 - 一种新的表示,用于时空中的交互式视频生成和操纵。在推断时间的单个图像中,我们的新颖框架使用户可以通过提供一系列所需操作来生成视频,以3D移动对象。这些行动是以无监督的方式学习的。可以控制相机以获取所需的视点。我们的方法为每个框架构建了一个环境状态,可以通过我们提出的动作模块来操纵,并用体积渲染解码回图像空间。为了支持对象的各种外观,我们通过基于样式的调制扩展了神经辐射场。我们的方法训练各种单眼视频的集合,仅需要估计的摄像头参数和2D对象位置。为了设定具有挑战性的基准,我们介绍了两个具有重要相机运动的大型视频数据集。正如我们的实验所证明的那样,可玩环境可以使以前的视频合成作品无法实现几种创意应用程序,包括可播放的3D视频生成,风格化和操纵。有关更多详细信息,代码和示例,请访问https://willi-menapace.github.io/playable-environments-website
We present Playable Environments - a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint. Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering. To support diverse appearances of objects, we extend neural radiance fields with style-based modulation. Our method trains on a collection of various monocular videos requiring only the estimated camera parameters and 2D object locations. To set a challenging benchmark, we introduce two large scale video datasets with significant camera movements. As evidenced by our experiments, playable environments enable several creative applications not attainable by prior video synthesis works, including playable 3D video generation, stylization and manipulation. Further details, code and examples are available at https://willi-menapace.github.io/playable-environments-website