论文标题

使用变压器进行视觉和语言导航的拓扑规划

Topological Planning with Transformers for Vision-and-Language Navigation

论文作者

Chen, Kevin, Chen, Junshen K., Chuang, Jo, Vázquez, Marynel, Savarese, Silvio

论文摘要

传统的视觉和语言导航方法(VLN)是训练有素的端到端,但在自由遍布的环境中努力表现良好。受机器人界的启发,我们提出了使用拓扑图的模块化方法来进行VLN。鉴于自然语言指导和拓扑图,我们的方法利用注意机制预测地图中的导航计划。然后使用强大的控制器使用低级动作(例如向前旋转)执行该计划。实验表明,我们的方法的表现优于先前的端到端方法,生成可解释的导航计划,并表现出智能行为,例如回溯。

Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源