一种在具有概率目标配置的环境中学习计划的上下文强盗方法

论文标题

一种在具有概率目标配置的环境中学习计划的上下文强盗方法

A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations

论文作者

Rudra, Sohan, Goel, Saksham, Santara, Anirban, Gentile, Claudio, Perron, Laurent, Xia, Fei, Sindhwani, Vikas, Parada, Carolina, Aggarwal, Gaurav

论文摘要

对象目标导航（Object-NAV）需要搜索，识别和导航到目标对象。 Object-NAV已由体现的AI社区进行了广泛的研究，但是大多数解决方案通常仅限于考虑静态对象（例如电视，冰箱等）。我们为Object-NAV提出了一个模块化框架，该框架能够有效地搜索室内环境，不仅是静态对象，还可以移动的对象（例如水果，眼镜，手机等），这些对象经常因人干预而改变其位置。我们的上下文伴侣代理通过面对不确定性表现出乐观，并了解了从每个可通道位置发现不同对象的可能性，从而有效地探索了环境。这些可能性在加权最小延迟求解器中用作奖励，以推断机器人的轨迹。我们在两个模拟环境和一个现实环境中评估算法，以证明样本效率和可靠性很高。

Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题