次优政策协助羊群控制的多代理增强学习

论文标题

次优政策协助羊群控制的多代理增强学习

Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control

论文作者

Qiu, Yunbo, Jin, Yue, Wang, Jian, Zhang, Xudong

论文摘要

羊群控制是一个具有挑战性的问题，在维持羊群并避免碰撞环境中的障碍物和碰撞的同时，多个代理（例如无人机或车辆）需要达到目标位置。多代理增强学习在羊群控制中取得了有希望的表现。但是，基于传统强化学习的方法需要代理与环境之间的相互作用。本文提出了一项次优政策辅助多代理增强学习算法（SPA-MARL），以提高样本效率。 Spa-Marl直接利用可以通过非学习方法手动设计或解决的先前政策来帮助代理人学习，在这种情况下，该政策的表现可以是最佳的。 SPA-MARL认识到次优政策与其本身之间的性能差异，然后模仿优化策略，如果次优政策更好。我们利用Spa-Marl解决羊群控制问题。基于人造潜在领域的传统控制方法用于生成次优政策。实验表明，水疗中心可以加快训练过程，并胜过MARL基线和所使用的次优政策。

Flocking control is a challenging problem, where multiple agents, such as drones or vehicles, need to reach a target position while maintaining the flock and avoiding collisions with obstacles and collisions among agents in the environment. Multi-agent reinforcement learning has achieved promising performance in flocking control. However, methods based on traditional reinforcement learning require a considerable number of interactions between agents and the environment. This paper proposes a sub-optimal policy aided multi-agent reinforcement learning algorithm (SPA-MARL) to boost sample efficiency. SPA-MARL directly leverages a prior policy that can be manually designed or solved with a non-learning method to aid agents in learning, where the performance of the policy can be sub-optimal. SPA-MARL recognizes the difference in performance between the sub-optimal policy and itself, and then imitates the sub-optimal policy if the sub-optimal policy is better. We leverage SPA-MARL to solve the flocking control problem. A traditional control method based on artificial potential fields is used to generate a sub-optimal policy. Experiments demonstrate that SPA-MARL can speed up the training process and outperform both the MARL baseline and the used sub-optimal policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题