论文标题
Blobgan:在空间上解开的场景表示形式
BlobGAN: Spatially Disentangled Scene Representations
论文作者
论文摘要
我们为场景的生成模型提出了一个无监督的中层表示。该表示是中层的,因为它既不是人均也不是每个像素。相反,场景被建模为一组空间,深度订购的“斑点”功能。斑点被分化在特征网格上,该特征网格被生成对抗网络解码为图像。由于斑点的空间均匀性和卷积固有的局部性,我们的网络学会了将不同的斑点与场景中的不同实体相关联,并安排这些斑点以捕获场景布局。我们通过证明,尽管没有任何监督训练,但我们的方法就可以在场景中轻松操纵物体(例如,移动,搬迁和修复家具),在给定限制的情况下(例如,在特定位置上有合理的房间,在特定位置的合理房间),以及对现实的图像构成图像。在充满挑战的室内场景的多类别数据集上,Blobgan在FID测量的图像质量上的表现优于stylegan2。有关视频结果和交互式演示,请参见我们的项目页面:https://www.dave.ml/blobgan
We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial uniformity of blobs and the locality inherent to convolution, our network learns to associate different blobs with different entities in a scene and to arrange these blobs to capture scene layout. We demonstrate this emergent behavior by showing that, despite training without any supervision, our method enables applications such as easy manipulation of objects within a scene (e.g., moving, removing, and restyling furniture), creation of feasible scenes given constraints (e.g., plausible rooms with drawers at a particular location), and parsing of real-world images into constituent parts. On a challenging multi-category dataset of indoor scenes, BlobGAN outperforms StyleGAN2 in image quality as measured by FID. See our project page for video results and interactive demo: https://www.dave.ml/blobgan