一次一步：长远的视觉和语言导航与里程碑

论文标题

一次一步：长远的视觉和语言导航与里程碑

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

论文作者

Song, Chan Hee, Kil, Jihyung, Pan, Tai-Yu, Sadler, Brian M., Chao, Wei-Lun, Su, Yu

论文摘要

我们研究了开发自主代理的问题，这些自主代理可以按照人类的指示来推断和执行一系列行动以完成基础任务。近年来取得了重大进展，尤其是对于短范围的任务。但是，当涉及具有扩展动作序列的长匹马任务时，代理可以很容易地忽略某些说明或陷入长长指令的中间，并最终使任务失败。为了应对这一挑战，我们提出了一个基于模型的里程碑的任务跟踪器（M-Track），以指导代理商并监视其进度。具体而言，我们提出了一个里程碑构建器，该建筑商通过导航和交互里程碑标记指令，代理商需要逐步完成，并有一个具有系统地检查代理商当前里程碑的进度并确定何时继续进行下一个的里程碑检查器。在具有挑战性的Alfred数据集上，我们的M轨道在两个竞争基本模型中取得了显着的33％和52％的相对相对相对提高。

We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker (M-TRACK) to guide the agent and monitor its progress. Specifically, we propose a milestone builder that tags the instructions with navigation and interaction milestones which the agent needs to complete step by step, and a milestone checker that systemically checks the agent's progress in its current milestone and determines when to proceed to the next. On the challenging ALFRED dataset, our M-TRACK leads to a notable 33% and 52% relative improvement in unseen success rate over two competitive base models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题