论文标题

通过基于及时的环境自我探索进行视觉语言导航进行了预测

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

论文作者

Liang, Xiwen, Zhu, Fengda, Li, Lingling, Xu, Hang, Liang, Xiaodan

论文摘要

视觉导航(VLN)由于环境中的较大搜索空间而成为一项艰巨的任务。为了解决这个问题,以前的作品提出了一些对大型数据集预处理的大型模型进行微调的方法。但是,常规的微调方法需要额外的人体标记的导航数据,并且缺乏环境中的自探测能力,这阻碍了他们对看不见的场景的概括。为了提高快速跨域适应的能力,我们提出了迅速的环境自我探索(探针),可以通过采样轨迹来自我探索环境,并通过大规模的跨模式预处理模型(CLIP)自动生成结构化指令。我们的方法充分利用了从剪辑中学到的知识来通过自我探索而没有人类标记来构建一个域数据集。与传统的微调方法不同,我们引入了基于及时的学习,以实现对语言嵌入的快速适应,从而通过利用先验知识来大大提高学习效率。通过在任何环境中自动合成轨迹 - 实施对,而没有人类的监督和基于及时的学习,我们的模型可以适应包括VLN和Reverie在内的各种视觉导航任务。定性和定量结果都表明,我们的探针显着提高了导航模型的概括能力。

Vision-language navigation (VLN) is a challenging task due to its large searching space in the environment. To address this problem, previous works have proposed some methods of fine-tuning a large model that pretrained on large-scale datasets. However, the conventional fine-tuning methods require extra human-labeled navigation data and lack self-exploration capabilities in environments, which hinders their generalization of unseen scenes. To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP). Our method fully utilizes the knowledge learned from CLIP to build an in-domain dataset by self-exploration without human labeling. Unlike the conventional approach of fine-tuning, we introduce prompt-based learning to achieve fast adaptation for language embeddings, which substantially improves the learning efficiency by leveraging prior knowledge. By automatically synthesizing trajectory-instruction pairs in any environment without human supervision and efficient prompt-based learning, our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE. Both qualitative and quantitative results show that our ProbES significantly improves the generalization ability of the navigation model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源