在动态场景上进行摄像机重新定位的图形注意网络

论文标题

在动态场景上进行摄像机重新定位的图形注意网络

Graph Attention Network for Camera Relocalization on Dynamic Scenes

论文作者

Ouali, Mohamed Amine, Bouguessa, Mohamed, Ksantini, Riadh

论文摘要

我们设计了一种基于图形注意网络的方法，用于学习场景三角网格表示，以估算动态环境中的图像摄像头位置。先前的方法构建了一个与场景有关的模型，该模型明确或隐式嵌入了场景的结构。他们使用卷积神经网络或决策树来建立2D/3D-3D对应关系。这样的映射使目标场景过度贴合，并且不能很好地推广到环境的动态变化。我们的工作介绍了一种新颖的方法，可以使用可用的三角网格解决相机重新定位问题。我们的3D-3D匹配框架由三个块组成：（1）图形神经网络来计算网格顶点的嵌入，（2）卷积神经网络，以计算在RGB-D图像上定义的网格单元的嵌入，以及（3）神经网络模型以在两个嵌入之间建立对应关系。这三个组件是端到端训练的。为了预测最终姿势，我们运行RANSAC算法以生成相机姿势假设，并使用点云表示来完善预测。我们的方法将最先进方法的相机姿势准确性从$ 0.358 $提高到Rio10基准的$ 0.358 $，用于动态室内摄像机重新定位。

We devise a graph attention network-based approach for learning a scene triangle mesh representation in order to estimate an image camera position in a dynamic environment. Previous approaches built a scene-dependent model that explicitly or implicitly embeds the structure of the scene. They use convolution neural networks or decision trees to establish 2D/3D-3D correspondences. Such a mapping overfits the target scene and does not generalize well to dynamic changes in the environment. Our work introduces a novel approach to solve the camera relocalization problem by using the available triangle mesh. Our 3D-3D matching framework consists of three blocks: (1) a graph neural network to compute the embedding of mesh vertices, (2) a convolution neural network to compute the embedding of grid cells defined on the RGB-D image, and (3) a neural network model to establish the correspondence between the two embeddings. These three components are trained end-to-end. To predict the final pose, we run the RANSAC algorithm to generate camera pose hypotheses, and we refine the prediction using the point-cloud representation. Our approach significantly improves the camera pose accuracy of the state-of-the-art method from $0.358$ to $0.506$ on the RIO10 benchmark for dynamic indoor camera relocalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题