论文标题
无地图视觉重新定位:度量姿势相对于单个图像
Map-free Visual Relocalization: Metric Pose Relative to a Single Image
论文作者
论文摘要
我们可以在由单个参考图像表示的场景中重新定位吗?标准视觉重新定制需要数百张图像和尺度校准才能构建特定场景的3D地图。相比之下,我们提出了无图的重新定位,即仅使用场景的一张照片来实现即时,度量标准的重新定位。现有的数据集不适合基准无地图重新定位,因为它们专注于大型场景或有限的可变性。因此,我们已经构建了一个新的数据集,该数据集由655个景点(例如雕塑,壁画和喷泉)组成,这些数据集已在全球范围内收集。每个地方都有一个参考图像作为重新定位锚,并带有数十张查询图像,并带有已知的公制摄像头姿势。该数据集具有变化的条件,鲜明的观点变化,各个地方的高变异性以及与参考图像重叠的查询。我们确定了现有方法的两个可行家族,以提供基线结果:相对姿势回归,并具有单位深度预测的特征匹配。尽管这些方法在我们的数据集中的一些有利场景中表现出合理的性能,但无地图的重新定位被证明是一个需要新的创新解决方案的挑战。
Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization, i.e., using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.