论文标题
远见的事后见识:从物理互动中的无监督结构化动力学模型
Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction
论文作者
论文摘要
对于学家学习与世界互动的主要挑战是推理对象的物理特性,并预见其在应用力的作用下的动态。为了通过与许多对象和场景进行互动来扩展学习,机器人应该能够从现实世界的经验中提高自己的绩效,而无需人工监督。为此,我们提出了一种新颖的方法,可以直接从未标记的3D点云和图像中建模机器人相互作用的动力学。与以前的方法不同,我们的方法不需要跟踪器或任何预训练的感知网络提供的基础真相数据关联。要从未标记的现实交互数据中学习,我们可以通过观察到的估计的3D云,动作和2D图像的估计的一致性。我们的联合向前和反向网络学会了将场景分为显着的对象部分,并在应用动作的效果下预测其3D运动。此外,我们以对象为中心的模型输出动作条件的3D场景流,对象掩码和2D光流作为新兴属性。我们在模拟和现实世界中的广泛评估表明,我们的配方会导致可用于视觉运动控制和计划的有效,可解释的模型。视频,代码和数据集可在http://hind4sight.cs.uni-freiburg.de上找到
A key challenge for an agent learning to interact with the world is to reason about physical properties of objects and to foresee their dynamics under the effect of applied forces. In order to scale learning through interaction to many objects and scenes, robots should be able to improve their own performance from real-world experience without requiring human supervision. To this end, we propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images. Unlike previous approaches, our method does not require ground-truth data associations provided by a tracker or any pre-trained perception network. To learn from unlabeled real-world interaction data, we enforce consistency of estimated 3D clouds, actions and 2D images with observed ones. Our joint forward and inverse network learns to segment a scene into salient object parts and predicts their 3D motion under the effect of applied actions. Moreover, our object-centric model outputs action-conditioned 3D scene flow, object masks and 2D optical flow as emergent properties. Our extensive evaluation both in simulation and with real-world data demonstrates that our formulation leads to effective, interpretable models that can be used for visuomotor control and planning. Videos, code and dataset are available at http://hind4sight.cs.uni-freiburg.de