激光：2D视觉定位的潜在空间渲染

论文标题

激光：2D视觉定位的潜在空间渲染

LASER: LAtent SpacE Rendering for 2D Visual Localization

论文作者

Min, Zhixiang, Khosravan, Naji, Bessinger, Zachary, Narayana, Manjunath, Kang, Sing Bing, Dunn, Enrique, Boyadzhiev, Ivaylo

论文摘要

我们提出了激光，这是一个基于图像的蒙特卡洛本地化（MCL）框架，用于2层地图。 Laser介绍了潜在空间渲染的概念，其中2D姿势假设通过汇总观看射线功能直接渲染到几何结构的潜在空间中。通过紧密耦合的渲染代码本方案，观看射线特征是根据其几何形状（即长度，入射角）动态确定的，从而使我们的表示形式依赖于观点的细元素可变性。我们的代码手册方案有效地解开了编码的特征，可以从渲染中编码，从而使潜在空间渲染以10kHz以上的速度运行。此外，通过度量学习，我们的几何结构潜在空间既构成假设，又是具有任意视图领域的查询图像。结果，激光在全景和透视图像查询方面达到了大规模室内定位数据集（即Zind和structred3D）的最新性能，同时在速度方面显着优于现有的基于学习的方法。

We present LASER, an image-based Monte Carlo Localization (MCL) framework for 2D floor maps. LASER introduces the concept of latent space rendering, where 2D pose hypotheses on the floor map are directly rendered into a geometrically-structured latent space by aggregating viewing ray features. Through a tightly coupled rendering codebook scheme, the viewing ray features are dynamically determined at rendering-time based on their geometries (i.e. length, incident-angle), endowing our representation with view-dependent fine-grain variability. Our codebook scheme effectively disentangles feature encoding from rendering, allowing the latent space rendering to run at speeds above 10KHz. Moreover, through metric learning, our geometrically-structured latent space is common to both pose hypotheses and query images with arbitrary field of views. As a result, LASER achieves state-of-the-art performance on large-scale indoor localization datasets (i.e. ZInD and Structured3D) for both panorama and perspective image queries, while significantly outperforming existing learning-based methods in speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题