论文标题
伪RGB-D用于自我改善的单眼大满贯和深度预测
Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction
论文作者
论文摘要
用于单眼深度预测的古典单眼同时定位和映射(SLAM)和最近新兴的卷积神经网络(CNN)表示,在建立周围环境的3D地图方面,两种差异的方法在很大程度上是不相关的方法。在本文中,我们证明了这两者的耦合通过利用每个人的优势减轻对方的缺点。具体而言,我们提出了一个基于狭窄和基线的关节基线自我改进框架,一方面,CNN预测的深度被利用以执行基于伪RGB-D特征的SLAM,从而比单眼RGB RGB SLLAM基线提高了准确性和鲁棒性。另一方面,通过捆绑束调整的3D场景结构和摄像机从更原则的几何大满贯中置于深度网络中,通过提议改善深度预测网络的新型宽基线损失将其注入深度网络,然后在下一次迭代中继续为更好的姿势和3D结构估计做出贡献。我们强调,我们的框架仅需要在培训和推理阶段中都需要未标记的单眼视频,但能够超越最先进的自我监督单眼和立体声深度预测网络(例如,单眼2)和基于特征的单眼猛击系统(I.E.E.E,ORB-Slam)。关于Kitti和TUM RGB-D数据集的广泛实验验证了我们自我改进的几何CNN框架的优越性。
Classical monocular Simultaneous Localization And Mapping (SLAM) and the recently emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment. In this paper, we demonstrate that the coupling of these two by leveraging the strengths of each mitigates the other's shortcomings. Specifically, we propose a joint narrow and wide baseline based self-improving framework, where on the one hand the CNN-predicted depth is leveraged to perform pseudo RGB-D feature-based SLAM, leading to better accuracy and robustness than the monocular RGB SLAM baseline. On the other hand, the bundle-adjusted 3D scene structures and camera poses from the more principled geometric SLAM are injected back into the depth network through novel wide baseline losses proposed for improving the depth prediction network, which then continues to contribute towards better pose and 3D structure estimation in the next iteration. We emphasize that our framework only requires unlabeled monocular videos in both training and inference stages, and yet is able to outperform state-of-the-art self-supervised monocular and stereo depth prediction networks (e.g, Monodepth2) and feature-based monocular SLAM system (i.e, ORB-SLAM). Extensive experiments on KITTI and TUM RGB-D datasets verify the superiority of our self-improving geometry-CNN framework.