部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

论文作者

Liang, Zhixuan, Cao, Jiannong, Jiang, Shan, Saxena, Divya, Xu, Huafeng

论文摘要

许多现实世界的应用程序都可以作为多机构合作问题制定，例如网络数据包路由和自动驾驶汽车的协调。深入增强学习（DRL）的出现通过代理和环境的相互作用为多机构合作提供了一种有希望的方法。但是，在政策搜索过程中，传统的DRL解决方案遭受了具有连续动作空间的多个代理的高维度。此外，代理商政策的动态性使训练非平稳。为了解决这些问题，我们建议采用高级决策和低级个人控制，以进行有效的政策搜索，提出一种分层增强学习方法。特别是，可以在高级离散的动作空间中有效地学习多种代理的合作。同时，低级个体控制可以减少为单药增强学习。除了分层增强学习外，我们还建议对手建模网络，以在学习过程中对其他代理的政策进行建模。与端到端的DRL方法相反，我们的方法通过以层次结构将总体任务分解为子任务来降低学习复杂性。为了评估我们方法的效率，我们在合作巷更改方案中进行了现实世界中的案例研究。模拟和现实世界实验都表明我们的方法在碰撞速度和收敛速度下的优越性。

Many real-world applications can be formulated as multi-agent cooperation problems, such as network packet routing and coordination of autonomous vehicles. The emergence of deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments. However, traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search. Besides, the dynamicity of agents' policies makes the training non-stationary. To tackle the issues, we propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search. In particular, the cooperation of multiple agents can be learned in high-level discrete action space efficiently. At the same time, the low-level individual control can be reduced to single-agent reinforcement learning. In addition to hierarchical reinforcement learning, we propose an opponent modeling network to model other agents' policies during the learning process. In contrast to end-to-end DRL approaches, our approach reduces the learning complexity by decomposing the overall task into sub-tasks in a hierarchical way. To evaluate the efficiency of our approach, we conduct a real-world case study in the cooperative lane change scenario. Both simulation and real-world experiments show the superiority of our approach in the collision rate and convergence speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题