通过深度加固学习对未知稀疏地标综合体的多代理探索

论文标题

通过深度加固学习对未知稀疏地标综合体的多代理探索

Multi-Agent Exploration of an Unknown Sparse Landmark Complex via Deep Reinforcement Learning

论文作者

Sun, Xiatao, Wu, Yuwei, Bhattacharya, Subhrajit, Kumar, Vijay

论文摘要

近年来，地标络合物已成功地用于无定位和无公制的自主探索，并使用一组受GPS污染的环境中的一组感应限制和通信有限的机器人。为了确保快速而完整的探索，现有作品对环境中地标的密度和分布做出了假设。这些假设可能过于限制，尤其是在可能被破坏或完全缺失的危险环境中。在本文中，我们首先提出了一个深厚的加强学习框架，以在具有稀疏地标的环境中，同时减少客户服务器交流的环境中。通过利用有关部分可观察性和信用分配的最新发展，我们的框架可以为多机器人系统有效地培训勘探政策。该政策从范围和分辨率有限的接近传感器基于近距离传感器的行动中获得个人奖励，该传感器与小组奖励相结合，以鼓励通过观察0-，1-维度和2维的简单来鼓励地标综合体的协作探索和建设。此外，我们采用三阶段的课程学习策略来通过逐渐增加随机障碍并破坏随机地标来减轻奖励稀疏性。模拟中的实验表明，我们的方法在不同环境之间具有稀疏地标的效率的最先进的地标复杂探索方法优于最先进的地标复杂探索方法。

In recent years Landmark Complexes have been successfully employed for localization-free and metric-free autonomous exploration using a group of sensing-limited and communication-limited robots in a GPS-denied environment. To ensure rapid and complete exploration, existing works make assumptions on the density and distribution of landmarks in the environment. These assumptions may be overly restrictive, especially in hazardous environments where landmarks may be destroyed or completely missing. In this paper, we first propose a deep reinforcement learning framework for multi-agent cooperative exploration in environments with sparse landmarks while reducing client-server communication. By leveraging recent development on partial observability and credit assignment, our framework can train the exploration policy efficiently for multi-robot systems. The policy receives individual rewards from actions based on a proximity sensor with limited range and resolution, which is combined with group rewards to encourage collaborative exploration and construction of the Landmark Complex through observation of 0-, 1- and 2-dimensional simplices. In addition, we employ a three-stage curriculum learning strategy to mitigate the reward sparsity by gradually adding random obstacles and destroying random landmarks. Experiments in simulation demonstrate that our method outperforms the state-of-the-art landmark complex exploration method in efficiency among different environments with sparse landmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题