学习停止：一种简单而有效的城市视觉导航方法

论文标题

学习停止：一种简单而有效的城市视觉导航方法

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

论文作者

Xiang, Jiannan, Wang, Xin Eric, Wang, William Yang

论文摘要

视觉和语言导航（VLN）是一项自然语言基础任务，代理商在其中学习语言说明并导航到现实世界环境中的指定目的地。一个关键的挑战是在正确的位置识别并停止，尤其是对于复杂的室外环境。现有方法将停止动作平等视为其他动作，这会导致代理商通常无法在目的地停止的不良行为，即使它可能处于正确的路径上。因此，我们建议学习停止（L2STOP），这是一个简单而有效的政策模块，可区分停止和其他行动。我们的方法在具有挑战性的Urban VLN数据集触地得分上实现了新的艺术状态，在成功加权（SED）的成功加权方面，基线的表现优于基线6.89％（绝对改善）。

Vision-and-Language Navigation (VLN) is a natural language grounding task where an agent learns to follow language instructions and navigate to specified destinations in real-world environments. A key challenge is to recognize and stop at the correct location, especially for complicated outdoor environments. Existing methods treat the STOP action equally as other actions, which results in undesirable behaviors that the agent often fails to stop at the destination even though it might be on the right path. Therefore, we propose Learning to Stop (L2Stop), a simple yet effective policy module that differentiates STOP and other actions. Our approach achieves the new state of the art on a challenging urban VLN dataset Touchdown, outperforming the baseline by 6.89% (absolute improvement) on Success weighted by Edit Distance (SED).

下载PDF全文

下载文献需遵守相关版权规定

论文标题