将动作复制到另一个：假运动视频生成

论文标题

将动作复制到另一个：假运动视频生成

Copy Motion From One to Another: Fake Motion Video Generation

论文作者

Liu, Zhenguang, Wu, Sifan, Xu, Chejian, Wang, Xiang, Zhu, Lei, Wu, Shuang, Feng, Fuli

论文摘要

人工智能的一种令人信服的应用是生成一个目标人员执行任意所需运动的视频（来自源头人员）。虽然最新的方法能够合成一个视频，展示了类似的宽带运动细节，但它们通常缺乏纹理细节。相关的表现出现为扭曲的面部，脚和手，这种缺陷被人类观察者非常敏感。此外，当前的方法通常采用gans损失L2损失来评估生成的视频的真实性，固有地需要大量的培训样品来学习纹理细节以进行足够的视频生成。在这项工作中，我们从三个方面应对这些挑战：1）我们将每个视频框架分解为前景（人）和背景，重点是生成前景，以减少网络输出的基本维度。 2）我们提出了一种理论上动机的Gromov-Wasserstein损失，可促进从姿势到前景图像学习映射。 3）为了增强纹理细节，我们用几何指导编码面部特征，并雇用当地的甘斯来完善面部，脚和手。广泛的实验表明，我们的方法能够生成现实的目标人视频，忠实地从源人员那里复制复杂的动作。

One compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion (from a source person). While the state-of-the-art methods are able to synthesize a video demonstrating similar broad stroke motion details, they are generally lacking in texture details. A pertinent manifestation appears as distorted face, feet, and hands, and such flaws are very sensitively perceived by human observers. Furthermore, current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos, inherently requiring a large amount of training samples to learn the texture details for adequate video generation. In this work, we tackle these challenges from three aspects: 1) We disentangle each video frame into foreground (the person) and background, focusing on generating the foreground to reduce the underlying dimension of the network output. 2) We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image. 3) To enhance texture details, we encode facial features with geometric guidance and employ local GANs to refine the face, feet, and hands. Extensive experiments show that our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.

下载PDF全文

下载文献需遵守相关版权规定

论文标题