组成视觉生成和基于能量模型的推断

论文标题

组成视觉生成和基于能量模型的推断

Compositional Visual Generation and Inference with Energy Based Models

论文作者

Du, Yilun, Li, Shuang, Mordatch, Igor

论文摘要

人类智力的一个重要方面是能够用更简单的想法构成日益复杂的概念，从而能够快速学习和适应知识的能力。在本文中，我们表明，基于能量的模型可以通过直接组合概率分布来表现出这种能力。合并分布的样品对应于概念的组成。例如，给出了微笑面孔的分布，另一个是男性面孔，我们可以将它们结合起来以产生微笑的男性面孔。这使我们能够生成自然图像，同时满足概念的结合，析取和否定。我们在自然面和合成3D场景图像的Celeba数据集上评估模型的组成生成能力。我们还展示了我们模型的其他独特优势，例如能够不断学习和结合新概念或推断图像概念属性的组成。

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this ability by directly combining probability distributions. Samples from the combined distribution correspond to compositions of concepts. For example, given a distribution for smiling faces, and another for male faces, we can combine them to generate smiling male faces. This allows us to generate natural images that simultaneously satisfy conjunctions, disjunctions, and negations of concepts. We evaluate compositional generation abilities of our model on the CelebA dataset of natural faces and synthetic 3D scene images. We also demonstrate other unique advantages of our model, such as the ability to continually learn and incorporate new concepts, or infer compositions of concept properties underlying an image.

下载PDF全文

下载文献需遵守相关版权规定

论文标题