论文标题
学习分解的可控人体运动预测的表示形式
Learning Disentangled Representations for Controllable Human Motion Prediction
论文作者
论文摘要
基于生成模型的运动预测技术最近已经实现了预测受控人类运动的,例如预测具有相似下体运动的多个上层人体运动。但是,为了实现这一目标,最新的方法需要随后学习映射功能以寻求类似的动作,或重复训练模型,以控制身体的所需部分。在本文中,我们提出了一个新颖的框架,以学习可控人体运动预测的分离表示。我们的网络涉及有条件的变分自动编码器(CVAE)体系结构,以建模全身运动,以及仅学习相应的部分体型(例如,下体)运动的额外CVAE路径。具体而言,额外CVAE路径施加的电感偏置鼓励两个路径中的两个潜在变量分别控制每个部分体运动的单独表示。通过一次训练,我们的模型能够为生成的人类动作提供两种类型的控件:(i)严格控制人体的一部分,(ii)通过从一对潜在空间中取样来自适应控制另一部分。此外,我们将抽样策略扩展到了我们训练的模型,以多样化可控的预测。我们的框架还有可能通过灵活地自定义额外CVAE路径的输入来允许新的控制形式。广泛的实验结果和消融研究表明,我们的方法能够在定性和定量上预测最新的可控人体运动。
Generative model-based motion prediction techniques have recently realized predicting controlled human motions, such as predicting multiple upper human body motions with similar lower-body motions. However, to achieve this, the state-of-the-art methods require either subsequently learning mapping functions to seek similar motions or training the model repetitively to enable control over the desired portion of body. In this paper, we propose a novel framework to learn disentangled representations for controllable human motion prediction. Our network involves a conditional variational auto-encoder (CVAE) architecture to model full-body human motion, and an extra CVAE path to learn only the corresponding partial-body (e.g., lower-body) motion. Specifically, the inductive bias imposed by the extra CVAE path encourages two latent variables in two paths to respectively govern separate representations for each partial-body motion. With a single training, our model is able to provide two types of controls for the generated human motions: (i) strictly controlling one portion of human body and (ii) adaptively controlling the other portion, by sampling from a pair of latent spaces. Additionally, we extend and adapt a sampling strategy to our trained model to diversify the controllable predictions. Our framework also potentially allows new forms of control by flexibly customizing the input for the extra CVAE path. Extensive experimental results and ablation studies demonstrate that our approach is capable of predicting state-of-the-art controllable human motions both qualitatively and quantitatively.