生成带有多个连贯对象的注释的高保真图像

论文标题

生成带有多个连贯对象的注释的高保真图像

Generating Annotated High-Fidelity Images Containing Multiple Coherent Objects

论文作者

Cardenas, Bryan G., Arya, Devanshu, Gupta, Deepak K.

论文摘要

与生成模型有关的最新发展使得产生多样化的高保真图像。特别是，布局到图像生成模型由于能力生成包含不同对象的逼真的复杂图像的能力而引起了极大的关注。这些模型通常以语义布局或文本描述为条件。但是，与自然图像不同，在生物医学成像和遥感等领域中提供辅助信息可能非常困难。在这项工作中，我们提出了一个多对象生成框架，该框架可以在生成过程中明确要求其上下文信息合成多个对象的图像。基于矢量定量的变分自动编码器（VQ-VAE）主链，我们的模型学会了通过两个强大的自动性自动化启示率：PixelsNail和LayoutPixelsnail，可以在对象和背景之间保留对象和背景之间的空间连贯性。当Pixelsnail了解VQ-VAE的潜在编码的分布时，LayoutPixelsNail用于专门学习对象的语义分布。我们方法的隐含优势是生成的样本伴随对象级注释。我们通过对多机士兵和CLEVR数据集的实验来证明如何与我们的方法保持一致性和忠诚度；从而优于最先进的多对象生成方法。通过在医学成像数据集上的应用来证明我们方法的功效，我们表明，使用我们的方法通过生成的样品来增强训练集可以改善现有模型的性能。

Recent developments related to generative models have made it possible to generate diverse high-fidelity images. In particular, layout-to-image generation models have gained significant attention due to their capability to generate realistic complex images containing distinct objects. These models are generally conditioned on either semantic layouts or textual descriptions. However, unlike natural images, providing auxiliary information can be extremely hard in domains such as biomedical imaging and remote sensing. In this work, we propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring their contextual information during the generation process. Based on a vector-quantized variational autoencoder (VQ-VAE) backbone, our model learns to preserve spatial coherency within an image as well as semantic coherency between the objects and the background through two powerful autoregressive priors: PixelSNAIL and LayoutPixelSNAIL. While the PixelSNAIL learns the distribution of the latent encodings of the VQ-VAE, the LayoutPixelSNAIL is used to specifically learn the semantic distribution of the objects. An implicit advantage of our approach is that the generated samples are accompanied by object-level annotations. We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets; thereby outperforming state-of-the-art multi-object generative methods. The efficacy of our approach is demonstrated through application on medical imaging datasets, where we show that augmenting the training set with generated samples using our approach improves the performance of existing models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题