姿势引导的人类图像合成与部分脱钩的gan

论文标题

姿势引导的人类图像合成与部分脱钩的gan

Pose Guided Human Image Synthesis with Partially Decoupled GAN

论文作者

Wu, Jianhan, Wang, Jianzong, Si, Shijing, Qu, Xiaoyang, Xiao, Jing

论文摘要

姿势引导的人类形象合成（PGHIS）是一项艰巨的任务，即在保留其风格的同时将人形象从参考姿势转变为目标姿势。大多数现有方法将整个参考人类图像的纹理编码为潜在空间，然后利用解码器合成目标姿势的图像纹理。但是，很难恢复整个人类形象的详细纹理。为了减轻这个问题，我们通过将人体分解为多个部分（例如，头发，面部，手，脚，\等），然后使用这些部分来指导人的综合图像，从而保留了生成图像的详细信息，从而提出了一种方法。此外，我们为PGHIS设计了一个基于注意的注意模块。由于大多数基于卷积神经网络的方法在建模由于卷积操作而难以建模长期依赖性，因此注意机制的远距离建模能力比卷积神经网络更合适，用于姿势转移任务，尤其是对于尖锐的姿势变形。关于Market-1501和DeepFashion数据集的广泛实验表明，就定性和定量指标而言，我们的方法几乎优于其他现有的最新方法。

Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\eg, hair, face, hands, feet, \etc) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题