MOLTR：单眼RGB视频的多个物体定位，跟踪和重建

论文标题

MOLTR：单眼RGB视频的多个物体定位，跟踪和重建

MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from Monocular RGB Videos

论文作者

Li, Kejie, Rezatofighi, Hamid, Reid, Ian

论文摘要

对于未来的机器人和AR/VR应用程序，语义意识重建比仅几何重建更有优势，因为它不仅代表了事物的位置，而且代表了事物的位置。以对象为中心的映射是构建对象级重建的任务，其中对象是单独且有意义的实体，可以传达几何和语义信息。在本文中，我们提出了Moltr，这是仅使用单眼图像序列和相机姿势的以对象为中心映射的解决方案。当RGB摄像头捕获周围的视频时，它能够以在线方式定位，跟踪和重建多个对象。鉴于新的RGB帧，Moltr首先应用单眼3D检测器来定位感兴趣的对象，并提取其形状代码，该代码代表对象形状在学习的嵌入空间中。然后将检测合并到数据关联后地图中的现有对象。每个对象的运动状态（即运动状态和运动状态）由多个模型贝叶斯滤波器跟踪，并且对象形状通过融合多个形状代码来逐步完善。我们在基准数据集上评估室内和室外场景的基准数据集上的本地化，跟踪和重建，并显示出优于以前的方法的优越性能。

Semantic aware reconstruction is more advantageous than geometric-only reconstruction for future robotic and AR/VR applications because it represents not only where things are, but also what things are. Object-centric mapping is a task to build an object-level reconstruction where objects are separate and meaningful entities that convey both geometry and semantic information. In this paper, we present MOLTR, a solution to object-centric mapping using only monocular image sequences and camera poses. It is able to localise, track, and reconstruct multiple objects in an online fashion when an RGB camera captures a video of the surrounding. Given a new RGB frame, MOLTR firstly applies a monocular 3D detector to localise objects of interest and extract their shape codes that represent the object shapes in a learned embedding space. Detections are then merged to existing objects in the map after data association. Motion state (i.e. kinematics and the motion status) of each object is tracked by a multiple model Bayesian filter and object shape is progressively refined by fusing multiple shape code. We evaluate localisation, tracking, and reconstruction on benchmarking datasets for indoor and outdoor scenes, and show superior performance over previous approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题