脚印和单彩色图像的可用空间

论文标题

脚印和单彩色图像的可用空间

Footprints and Free Space from a Single Color Image

论文作者

Watson, Jamie, Firman, Michael, Monszpart, Aron, Brostow, Gabriel J.

论文摘要

从单色图像中了解场景的形状是一项强大的计算机视觉任务。但是，大多数方法旨在预测相机可见的表面的几何形状，这在为机器人或增强现实代理计划路径时的用途有限。这样的代理只有在接地在可遍布的表面上时才能移动，我们将其定义为人类也可以走过的一组类，例如草，人行道和人行道。预测视线之外的模型通常会用体素或网格参数化场景，这可能在机器学习框架中使用昂贵。我们介绍了一个模型，以预测单个RGB图像作为输入，可见和遮挡的可见遍布表面的几何形状。我们从立体声视频序列，使用相机姿势，每个框架深度和语义分段来形成训练数据，该数据用于监督图像到图像网络。我们从Kitti驾驶数据集，室内Matterport数据集以及我们自己随意捕获的立体声镜头训练模型。我们发现，需要一个令人惊讶的低标准，以用于训练场景的空间覆盖。我们针对一系列强大的基线验证了算法，并包括对我们对路径规划任务的预测进行评估。

Understanding the shape of a scene from a single color image is a formidable computer vision task. However, most methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Such agents can only move when grounded on a traversable surface, which we define as the set of classes which humans can also walk over, such as grass, footpaths and pavement. Models which predict beyond the line of sight often parameterize the scene with voxels or meshes, which can be expensive to use in machine learning frameworks. We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input. We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data, which is used to supervise an image-to-image network. We train models from the KITTI driving dataset, the indoor Matterport dataset, and from our own casually captured stereo footage. We find that a surprisingly low bar for spatial coverage of training scenes is required. We validate our algorithm against a range of strong baselines, and include an assessment of our predictions for a path-planning task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题