通过运动和对象连续性来增强对象表示学习

论文标题

通过运动和对象连续性来增强对象表示学习

Boosting Object Representation Learning via Motion and Object Continuity

论文作者

Delfosse, Quentin, Stammer, Wolfgang, Rothenbacher, Thomas, Vittal, Dwarak, Kersting, Kristian

论文摘要

最近无监督的多对象检测模型显示出令人印象深刻的性能改善，这主要归因于新型的建筑感应偏见。不幸的是，它们可能会为下游任务产生次优对象编码。为了克服这一点，我们建议利用对象运动和连续性，即对象不存在和不存在。这是通过两种机制来完成的：（i）通过集成光流在对象的位置提供先验，以及（ii）连续图像框架之间的对比对象连续性损失。可以使用任何基线对象检测模型实例化所得的运动和对象连续性（MOC）方案，而不是开发明确的深度体系结构。我们的结果表明，在对象发现，收敛速度和整体潜在对象表示方面，SOTA模型的性能有很大改善，尤其是在玩Atari游戏中。总体而言，我们显示了仅基于重建的对象表示学习，将运动和对象连续性整合到下游任务的明显好处。

Recent unsupervised multi-object detection models have shown impressive performance improvements, largely attributed to novel architectural inductive biases. Unfortunately, they may produce suboptimal object encodings for downstream tasks. To overcome this, we propose to exploit object motion and continuity, i.e., objects do not pop in and out of existence. This is accomplished through two mechanisms: (i) providing priors on the location of objects through integration of optical flow, and (ii) a contrastive object continuity loss across consecutive image frames. Rather than developing an explicit deep architecture, the resulting Motion and Object Continuity (MOC) scheme can be instantiated using any baseline object detection model. Our results show large improvements in the performances of a SOTA model in terms of object discovery, convergence speed and overall latent object representations, particularly for playing Atari games. Overall, we show clear benefits of integrating motion and object continuity for downstream tasks, moving beyond object representation learning based only on reconstruction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题