图像反转和编辑的样式变压器

论文标题

图像反转和编辑的样式变压器

Style Transformer for Image Inversion and Editing

论文作者

Hu, Xueqi, Huang, Qiusheng, Shi, Zhengyi, Li, Siyuan, Gao, Changxin, Sun, Li, Li, Qingli

论文摘要

现有的GAN反演方法无法同时提供可靠重建和灵活编辑的潜在代码。本文介绍了基于变压器的图像反演和审计样式的编辑模型，不仅扭曲较少，而且具有高质量和灵活性的编辑。所提出的模型采用CNN编码器提供多尺度图像特征作为键和值。同时，它将为生成器的不同层确定的样式代码视为查询。它首先将查询令牌作为可学习的参数进行初始化，并将它们映射到W+空间中。然后，使用多阶段的替代自我和交叉注意事项，以更新查询，目的是颠倒发电机的输入。此外，基于倒置代码，我们通过预验证的潜在分类器研究基于参考和标签的属性编辑，并实现具有高质量结果的灵活的图像到图像翻译。进行了广泛的实验，在StyleGan内的反转和编辑任务上表现出更好的性能。

Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously. This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN which is not only with less distortions, but also of high quality and flexibility for editing. The proposed model employs a CNN encoder to provide multi-scale image features as keys and values. Meanwhile it regards the style code to be determined for different layers of the generator as queries. It first initializes query tokens as learnable parameters and maps them into W+ space. Then the multi-stage alternate self- and cross-attention are utilized, updating queries with the purpose of inverting the input by the generator. Moreover, based on the inverted code, we investigate the reference- and label-based attribute editing through a pretrained latent classifier, and achieve flexible image-to-image translation with high quality results. Extensive experiments are carried out, showing better performances on both inversion and editing tasks within StyleGAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题