论文标题
学习信息理论主动感知的连续控制政策
Learning Continuous Control Policies for Information-Theoretic Active Perception
论文作者
论文摘要
本文提出了一种使用信息理论成本来学习有效地标本地化和探索的连续控制策略的方法。我们考虑一个移动机器人在有限的传感范围内检测地标,并解决学习控制政策的问题,该政策最大程度地提高了地标状态与传感器观测之间的相互信息。我们采用Kalman过滤器将地标州的部分可观察到的问题转换为马尔可夫决策过程(MDP),这是一个可区分的视野来塑造奖励,以及基于注意力的神经网络来代表控制策略。除了具有里程碑意义的定位外,该方法通过主动容积映射进一步统一,以促进探索。与基准方法相比,在几个模拟的地标本地化任务中证明了该性能。
This paper proposes a method for learning continuous control policies for active landmark localization and exploration using an information-theoretic cost. We consider a mobile robot detecting landmarks within a limited sensing range, and tackle the problem of learning a control policy that maximizes the mutual information between the landmark states and the sensor observations. We employ a Kalman filter to convert the partially observable problem in the landmark state to Markov decision process (MDP), a differentiable field of view to shape the reward, and an attention-based neural network to represent the control policy. The approach is further unified with active volumetric mapping to promote exploration in addition to landmark localization. The performance is demonstrated in several simulated landmark localization tasks in comparison with benchmark methods.