MOVQ：调整量化的向量以生成高保真图像

论文标题

MOVQ：调整量化的向量以生成高保真图像

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

论文作者

Zheng, Chuanxia, Vuong, Long Tung, Cai, Jianfei, Phung, Dinh

论文摘要

尽管两级矢量量化（VQ）生成模型允许合成高保真性和高分辨率图像，但其量化操作员将图像中的相似斑块编码为相同的索引，从而为相似的相邻区域重复使用现有解码器体系结构。为了解决这个问题，我们建议将空间条件的归一化结合起来，以调节量化的向量，以便将空间变体信息插入嵌入式索引图中，从而鼓励解码器生成更真实的图像。此外，我们使用多通道量化来增加离散代码的重组能力，而无需增加模型和代码簿的成本。此外，为了在第二阶段生成离散令牌，我们采用掩盖生成图像变压器（MaskGit）来学习压缩潜在空间中的基础先前分布，该分布比常规自动回旋模型快得多。两个基准数据集的实验表明，我们提出的调制VQGAN能够大大提高重建的图像质量，并提供高保真图像的产生。

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题