与全景布局生成的交互式图像合成

论文标题

与全景布局生成的交互式图像合成

Interactive Image Synthesis with Panoptic Layout Generation

论文作者

Wang, Bo, Wu, Tao, Zhu, Minfeng, Du, Peng

论文摘要

Interactive image synthesis from user-guided input is a challenging task when users wish to control the scene structure of a generated image with ease.Although remarkable progress has been made on layout-based image synthesis approaches, in order to get realistic fake image in interactive scene, existing methods require high-precision inputs, which probably need adjustment several times and are unfriendly to novice users.当边界框的放置受到扰动时，基于布局的模型在构造的语义布局中遭受“缺失区域”的影响，因此在生成的图像中不良文物。在这项工作中，我们提出了全景布局生成对抗网络（PLGAN）来应对这一挑战。 PLGAN采用全景理论，将对象类别区分为具有无定形边界的“东西”和具有明确定义的形状的“事物”，从而通过单独的分支来构建东西和实例布局，然后融合到全盘布局中。特别是，这些布局可以采用非晶形状，并填充实例布局所遗漏的缺失区域。我们通过实验性地将PLGAN与可可使用，视觉基因组和景观数据集的最新布局模型进行了比较。 PLGAN的优势不仅在视觉上证明，而且可以根据成立得分，Fréchet成立距离，分类精度得分和覆盖范围进行数量验证。

Interactive image synthesis from user-guided input is a challenging task when users wish to control the scene structure of a generated image with ease.Although remarkable progress has been made on layout-based image synthesis approaches, in order to get realistic fake image in interactive scene, existing methods require high-precision inputs, which probably need adjustment several times and are unfriendly to novice users. When placement of bounding boxes is subject to perturbation, layout-based models suffer from "missing regions" in the constructed semantic layouts and hence undesirable artifacts in the generated images. In this work, we propose Panoptic Layout Generative Adversarial Networks (PLGAN) to address this challenge. The PLGAN employs panoptic theory which distinguishes object categories between "stuff" with amorphous boundaries and "things" with well-defined shapes, such that stuff and instance layouts are constructed through separate branches and later fused into panoptic layouts. In particular, the stuff layouts can take amorphous shapes and fill up the missing regions left out by the instance layouts. We experimentally compare our PLGAN with state-of-the-art layout-based models on the COCO-Stuff, Visual Genome, and Landscape datasets. The advantages of PLGAN are not only visually demonstrated but quantitatively verified in terms of inception score, Fréchet inception distance, classification accuracy score, and coverage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题