用于安全覆盖范围控制的近乎最佳的多学院学习

论文标题

用于安全覆盖范围控制的近乎最佳的多学院学习

Near-Optimal Multi-Agent Learning for Safe Coverage Control

论文作者

Prajapat, Manish, Turchetta, Matteo, Zeilinger, Melanie N., Krause, Andreas

论文摘要

在多机构覆盖范围控制问题中，代理商在环境中导航以达到最大程度地覆盖某些密度的位置。实际上，密度很少是$ \ textit {a先验} $，这使原始的NP硬性问题更加复杂。此外，在许多应用程序中，由于$ \ textit {a先验} $未知的安全约束，代理无法访问任意位置。在本文中，我们旨在有效地学习密度，以近似解决覆盖范围问题，同时保留代理商的安全性。我们首先提出了有条件的线性下覆盖函数，以促进理论分析。利用这种结构，我们开发了一种新颖的算法，该算法由于部分可观察性而有效地摆脱了探索 - 探索困境，并表明它使人感到遗憾。接下来，我们将单格安全探索的结果扩展到我们的多代理设置，并提出SAFEMAC以进行安全覆盖和勘探。我们分析Safemac并首先给出同类结果：在有限的时间内几乎最佳覆盖范围，同时可以保证安全。我们对合成和实际问题进行了广泛的评估算法，包括在安全限制下的生物多样性监测任务，而SAFEMAC优于竞争方法。

In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We first propose a conditionally linear submodular coverage function that facilitates theoretical analysis. Utilizing this structure, we develop MacOpt, a novel algorithm that efficiently trades off the exploration-exploitation dilemma due to partial observability, and show that it achieves sublinear regret. Next, we extend results on single-agent safe exploration to our multi-agent setting and propose SafeMac for safe coverage and exploration. We analyze SafeMac and give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety. We extensively evaluate our algorithms on synthetic and real problems, including a bio-diversity monitoring task under safety constraints, where SafeMac outperforms competing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题