提高目标驱动的视觉导航，并注意3D空间关系

论文标题

提高目标驱动的视觉导航，并注意3D空间关系

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

论文作者

Lv, Yunlian, Xie, Ning, Shi, Yimin, Wang, Zijiao, Shen, Heng Tao

论文摘要

体现的人工智能（AI）任务从关注Internet图像的任务转变为涉及在3D环境中感知和行动的体现代理的主动设置。在本文中，我们在3D室内场景中使用深度加固学习（DRL）研究了目标驱动的视觉导航，其导航任务旨在训练一个可以智能地做出一系列决定的代理，以从仅基于以Egentric的视图为基础的任何可能的起始位置来到预先指定的目标位置。但是，目前，大多数导航方法都在努力应对几个具有挑战性的问题，例如数据效率，避免障碍物和概括。概括问题意味着代理没有能力转移从以前的经验中学到的导航技能来看不见的目标和场景。为了解决这些问题，我们将两个设计纳入经典的DRL框架中：3D知识图（kg）和目标技能扩展（TSE）模块上的注意力。一方面，我们提出的方法结合了视觉特征和3D空间表示，以学习导航策略。另一方面，TSE模块用于生成子目标，使代理可以从失败中学习。具体而言，我们的3D空间关系是通过最近流行的图形卷积网络（GCN）编码的。考虑到现实世界的环境，我们的工作还考虑开放行动，并将可行的目标添加到常规的导航情况中。这些更困难的设置用于测试DRL代理是否真正了解其任务，导航环境并可以执行推理。我们在AI2中进行的实验表明，我们的模型在SR和SPL指标中都优于基准，并提高了目标和场景之间的概括能力。

Embodied artificial intelligence (AI) tasks shift from tasks focusing on internet images to active settings involving embodied agents that perceive and act within 3D environments. In this paper, we investigate the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decisions to arrive at a pre-specified target location from any possible starting positions only based on egocentric views. However, most navigation methods currently struggle against several challenging problems, such as data efficiency, automatic obstacle avoidance, and generalization. Generalization problem means that agent does not have the ability to transfer navigation skills learned from previous experience to unseen targets and scenes. To address these issues, we incorporate two designs into classic DRL framework: attention on 3D knowledge graph (KG) and target skill extension (TSE) module. On the one hand, our proposed method combines visual features and 3D spatial representations to learn navigation policy. On the other hand, TSE module is used to generate sub-targets which allow agent to learn from failures. Specifically, our 3D spatial relationships are encoded through recently popular graph convolutional network (GCN). Considering the real world settings, our work also considers open action and adds actionable targets into conventional navigation situations. Those more difficult settings are applied to test whether DRL agent really understand its task, navigating environment, and can carry out reasoning. Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics, and improves generalization ability across targets and scenes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题