论文标题

以全局为中心表示的组成场景建模

Compositional Scene Modeling with Global Object-Centric Representations

论文作者

Chen, Tonglin, Li, Bin, Shen, Zhimeng, Xue, Xiangyang

论文摘要

由于观点和对象之间的遮挡,同一对象的外观在不同的场景图像中可能会有所不同。通过根据内存中的规范图像完成遮挡部分,人类即使存在阻塞,也可以轻松地识别相同的对象。实现这种能力仍然是机器学习的挑战,尤其是在无监督的学习环境下。受到人类这种能力的启发,本文提出了一种构图场景建模方法来推断对象规范图像的全局表示,而无需任何监督。每个对象的表示形式分为一个内在部分,该部分表征了全局不变的信息(即对象的规范表示),而外部部分则表征了与场景相关信息(例如位置和大小)的特征。为了推断每个对象的固有表示形式,我们采用贴片匹配策略来使可能被遮挡的对象的表示形式与对象的规范表示,并根据基于由摊销的变异推理确定的对象类别采样最可能的规范表示。在四个以对象学习为中心的基准上进行了广泛的实验,实验结果表明,所提出的方法不仅在细分和重建方面都优于最先进的方法,而且还可以实现良好的全球对象识别性能。

The appearance of the same object may vary in different scene images due to perspectives and occlusions between objects. Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its canonical image in the memory. Achieving this ability is still a challenge for machine learning, especially under the unsupervised learning setting. Inspired by such an ability of humans, this paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision. The representation of each object is divided into an intrinsic part, which characterizes globally invariant information (i.e. canonical representation of an object), and an extrinsic part, which characterizes scene-dependent information (e.g., position and size). To infer the intrinsic representation of each object, we employ a patch-matching strategy to align the representation of a potentially occluded object with the canonical representations of objects, and sample the most probable canonical representation based on the category of object determined by amortized variational inference. Extensive experiments are conducted on four object-centric learning benchmarks, and experimental results demonstrate that the proposed method not only outperforms state-of-the-arts in terms of segmentation and reconstruction, but also achieves good global object identification performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源