KPE：基于变压器图像生成的关键点姿势编码

论文标题

KPE：基于变压器图像生成的关键点姿势编码

KPE: Keypoint Pose Encoding for Transformer-based Image Generation

论文作者

Cheong, Soon Yau, Mustafa, Armin, Gilbert, Andrew

论文摘要

最近已显示变压器从文本输入中生成高质量的图像。但是，现有的使用骨架图像令牌姿势调节的方法在计算上效率低下并产生低质量的图像。因此，我们提出了一种新方法。关键点姿势编码（KPE）; KPE的内存效率高10倍，并且在从姿势条件下的文本输入中生成高质量的图像时，KPE的速度超过73％。姿势限制改善了图像质量，并减少了臂和腿等身体四肢的错误。附加的好处包括对目标图像域和图像分辨率的变化的不变性，使其易于扩展到更高分辨率的图像。我们通过生成源自DeepFashion数据集的感性多人物图像来证明KPE的多功能性。我们还引入了一种评估方法人数误差（PCE），可有效检测生成的人类图像中的误差。

Transformers have recently been shown to generate high quality images from text input. However, the existing method of pose conditioning using skeleton image tokens is computationally inefficient and generate low quality images. Therefore we propose a new method; Keypoint Pose Encoding (KPE); KPE is 10 times more memory efficient and over 73% faster at generating high quality images from text input conditioned on the pose. The pose constraint improves the image quality and reduces errors on body extremities such as arms and legs. The additional benefits include invariance to changes in the target image domain and image resolution, making it easily scalable to higher resolution images. We demonstrate the versatility of KPE by generating photorealistic multiperson images derived from the DeepFashion dataset. We also introduce a evaluation method People Count Error (PCE) that is effective in detecting error in generated human images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题