迈向细粒度的人姿势转移，并详细补充网络

论文标题

迈向细粒度的人姿势转移，并详细补充网络

Towards Fine-grained Human Pose Transfer with Detail Replenishing Network

论文作者

Yang, Lingbo, Wang, Pan, Liu, Chang, Gao, Zhanning, Ren, Peiran, Zhang, Xinfeng, Wang, Shanshe, Ma, Siwei, Hua, Xiansheng, Gao, Wen

论文摘要

人类姿势转移（HPT）是一个新兴的研究主题，在时装设计，媒体制作，在线广告和虚拟现实中具有巨大潜力。对于这些应用，细粒度细节的视觉现实主义对于生产质量和用户参与至关重要。但是，现有的HPT方法通常会遇到三个基本问题：细节缺陷，内容歧义和风格不一致，从而严重降低了产生图像的视觉质量和现实主义。为了实现现实世界的应用，我们开发了一种更具挑战性而实用的HPT设置，称为细粒度的人类姿势转移（FHPT），更加重视语义忠诚度和细节补充。具体而言，我们通过说明性示例分析了现有方法的潜在设计缺陷，并通过梳理内容综合的概念并以相互指导的方式梳理核心FHPT方法论。此后，我们通过详细补充网络（DRN）和相应的粗到精细模型训练方案来证实所提出的方法。此外，我们建立了一套完整的精细评估协议，以全面的方式应对FHPT的挑战，包括语义分析，结构检测和感知质量评估。对DeepFashion基准数据集进行的广泛实验已经验证了拟议的基准对初始作品的功能，在前10名检索召回率中有12 \％-14 \％的增长，更高的关节定位准确性提高了5 \％，在面部身份保存方面接近40 \％的增益。此外，评估结果为主题提供了进一步的见解，这可能会激发沿着这个方向的许多有前途的未来工作。

Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality. For these applications, the visual realism of fine-grained appearance details is crucial for production quality and user engagement. However, existing HPT methods often suffer from three fundamental issues: detail deficiency, content ambiguity and style inconsistency, which severely degrade the visual quality and realism of generated images. Aiming towards real-world applications, we develop a more challenging yet practical HPT setting, termed as Fine-grained Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail replenishment. Concretely, we analyze the potential design flaws of existing methods via an illustrative example, and establish the core FHPT methodology by combing the idea of content synthesis and feature transfer together in a mutually-guided fashion. Thereafter, we substantiate the proposed methodology with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine model training scheme. Moreover, we build up a complete suite of fine-grained evaluation protocols to address the challenges of FHPT in a comprehensive manner, including semantic analysis, structural detection and perceptual quality assessment. Extensive experiments on the DeepFashion benchmark dataset have verified the power of proposed benchmark against start-of-the-art works, with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization accuracy, and near 40\% gain on face identity preservation. Moreover, the evaluation results offer further insights to the subject matter, which could inspire many promising future works along this direction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题