语义图像合成与语义耦合的VQ模型

论文标题

语义图像合成与语义耦合的VQ模型

Semantic Image Synthesis with Semantically Coupled VQ-Model

论文作者

Alaniz, Stephan, Hummel, Thomas, Akata, Zeynep

论文摘要

语义图像合成可以通过允许对正在生成的内容进行指导来控制无条件图像的生成。我们从有条件地从预先训练的自动码图像的矢量量化模型（VQ模型）中综合了潜在空间。我们发现，共同学习条件和图像潜在的参与可以显着提高变压器模型的建模能力，而不是在分别学习的条件潜在和图像潜伏期上训练自回旋变压器。尽管我们经过训练的VQ模型在语义和图像潜伏期中都达到了类似的重建性能，但在自动编码阶段将两种模式绑定在一起，这被证明是提高自动性建模性能的重要组成部分。我们表明，我们的模型使用流行的语义图像数据集ADE20K，CityScapes和Coco-stuff上的自回归模型来改善语义图像综合。

Semantic image synthesis enables control over unconditional image generation by allowing guidance on what is being generated. We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images. Instead of training an autoregressive Transformer on separately learned conditioning latents and image latents, we find that jointly learning the conditioning and image latents significantly improves the modeling capabilities of the Transformer model. While our jointly trained VQ-model achieves a similar reconstruction performance to a vanilla VQ-model for both semantic and image latents, tying the two modalities at the autoencoding stage proves to be an important ingredient to improve autoregressive modeling performance. We show that our model improves semantic image synthesis using autoregressive models on popular semantic image datasets ADE20k, Cityscapes and COCO-Stuff.

下载PDF全文

下载文献需遵守相关版权规定

论文标题