通过专家的竞争来实现因果生成场景模型

论文标题

通过专家的竞争来实现因果生成场景模型

Towards causal generative scene models via competition of experts

论文作者

von Kügelgen, Julius, Ustyuzhaninov, Ivan, Gehler, Peter, Bethge, Matthias, Schölkopf, Bernhard

论文摘要

学习如何以模块化方式与可重组组件建模复杂的场景是在物理世界中进行高阶推理和行动的先决条件。但是，当前的生成模型缺乏捕获视觉场景固有的组成和分层性质的能力。尽管最近的工作已经取得了进步，但在学习基于对象的场景表示的无监督学习方面，但大多数模型仍保持全局表示空间（即，对象没有明确分开），并且无法通过新颖的对象布置和深度顺序生成场景。在这里，我们提出了一种替代方法，该方法使用归纳性偏见来鼓励模块化模块化（专家）。在培训期间，专家竞争解释场景的一部分，因此专门研究不同的对象类，将对象确定为在多个场景中重新发生的部分。我们的模型允许对单个对象进行可控采样，并以物理上合理的方式重组专家。与其他方法相反，正确处理深度分层和遮挡，将此方法更接近因果生成场景模型。对简单的玩具数据进行定性的实验证明了该方法的概念优势。

Learning how to model complex scenes in a modular way with recombinable components is a pre-requisite for higher-order reasoning and acting in the physical world. However, current generative models lack the ability to capture the inherently compositional and layered nature of visual scenes. While recent work has made progress towards unsupervised learning of object-based scene representations, most models still maintain a global representation space (i.e., objects are not explicitly separated), and cannot generate scenes with novel object arrangement and depth ordering. Here, we present an alternative approach which uses an inductive bias encouraging modularity by training an ensemble of generative models (experts). During training, experts compete for explaining parts of a scene, and thus specialise on different object classes, with objects being identified as parts that re-occur across multiple scenes. Our model allows for controllable sampling of individual objects and recombination of experts in physically plausible ways. In contrast to other methods, depth layering and occlusion are handled correctly, moving this approach closer to a causal generative scene model. Experiments on simple toy data qualitatively demonstrate the conceptual advantages of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题