几何形状对齐的变形变压器，用于图像条件的布局生成

论文标题

几何形状对齐的变形变压器，用于图像条件的布局生成

Geometry Aligned Variational Transformer for Image-conditioned Layout Generation

论文作者

Cao, Yunning, Ma, Ye, Zhou, Min, Liu, Chuanbin, Xie, Hongtao, Ge, Tiezheng, Jiang, Yuning

论文摘要

布局生成是计算机视觉中的一项新任务，它结合了对象本地化和美学评估中的挑战，在广告，海报和幻灯片设计中广泛使用。精确而愉快的布局应考虑布局元素内的内域关系以及布局元素与图像之间的域间关系。但是，大多数以前的方法只是专注于图像 - 范围 - 不平衡的布局生成，而无需利用图像中复杂的视觉信息。为此，我们探索了一个名为“图像条件的布局生成”的新颖范式，该范式旨在以语义相干的方式将文本叠加层添加到图像中。具体而言，我们提出了一个图像条件的变分变压器（ICVT），该变形变压器（ICVT）自动加入在图像中生成各种布局。首先，采用自我注意的机制来对布局元素内的上下文关系进行建模，而跨注意机制则用于融合条件图像的视觉信息。随后，我们将它们作为有条件的变异自动编码器（CVAE）的构件，这表明了具有吸引力的多样性。其次，为了减轻布局元素域和视觉域之间的差距，我们设计了一个几何比对模块，其中图像的几何信息与布局表示形式对齐。此外，我们构建了一个大规模的广告海报布局设计数据集，其中包含精致的布局和显着图。实验结果表明，我们的模型可以在图像的非侵入区域中自适应生成布局，从而实现和谐的布局设计。

Layout generation is a novel task in computer vision, which combines the challenges in both object localization and aesthetic appraisal, widely used in advertisements, posters, and slides design. An accurate and pleasant layout should consider both the intra-domain relationship within layout elements and the inter-domain relationship between layout elements and the image. However, most previous methods simply focus on image-content-agnostic layout generation, without leveraging the complex visual information from the image. To this end, we explore a novel paradigm entitled image-conditioned layout generation, which aims to add text overlays to an image in a semantically coherent manner. Specifically, we propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images. Subsequently, we take them as building blocks of conditional variational autoencoder (CVAE), which demonstrates appealing diversity. Second, in order to alleviate the gap between layout elements domain and visual domain, we design a Geometry Alignment module, in which the geometric information of the image is aligned with the layout representation. In addition, we construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations. Experimental results show that our model can adaptively generate layouts in the non-intrusive area of the image, resulting in a harmonious layout design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题