论文标题
RAUM-VO:旋转调整后的无监督单眼视觉计量学
RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
论文作者
论文摘要
依赖于表现几何形状或非线性优化的传统方法,对单眼相机运动和3D场景理解的无监督学习已广受欢迎。值得注意的是,深度学习可以克服许多单眼视觉问题,例如知觉混叠,低纹理的区域,规模拖延和退化运动。同样,关于监督学习,我们可以充分利用视频流数据,而无需深入或运动标签。但是,在这项工作中,我们注意到,旋转运动可以比翻译成分更限制无监督姿势网络的准确性。因此,我们提出了Raum-VO,这是一种基于框架到框架运动估计(F2F)的无模型的外两极约束的方法,以调整训练和在线推断期间的旋转。为此,我们使用预先训练的深网,超级点和超级胶水匹配连续帧之间的2D关键点,同时使用无监督的培训协议训练网络以进行深度和姿势估算。然后,我们使用2D匹配项通过F2F估计的运动调整预测旋转,并通过姿势网络预测初始化求解器。最终,与KITTI数据集中的其他无监督姿势网络相比,Raum-Vo在降低了其他混合或传统方法的复杂性并实现可比的最先进的结果时表现出了相当大的准确性提高。
Unsupervised learning for monocular camera motion and 3D scene understanding has gained popularity over traditional methods, relying on epipolar geometry or non-linear optimization. Notably, deep learning can overcome many issues of monocular vision, such as perceptual aliasing, low-textured areas, scale-drift, and degenerate motions. Also, concerning supervised learning, we can fully leverage video streams data without the need for depth or motion labels. However, in this work, we note that rotational motion can limit the accuracy of the unsupervised pose networks more than the translational component. Therefore, we present RAUM-VO, an approach based on a model-free epipolar constraint for frame-to-frame motion estimation (F2F) to adjust the rotation during training and online inference. To this end, we match 2D keypoints between consecutive frames using pre-trained deep networks, Superpoint and Superglue, while training a network for depth and pose estimation using an unsupervised training protocol. Then, we adjust the predicted rotation with the motion estimated by F2F using the 2D matches and initializing the solver with the pose network prediction. Ultimately, RAUM-VO shows a considerable accuracy improvement compared to other unsupervised pose networks on the KITTI dataset while reducing the complexity of other hybrid or traditional approaches and achieving comparable state-of-the-art results.