马尔可夫路由游戏的多机构增强学习：动态流量分配的新建模范式

论文标题

马尔可夫路由游戏的多机构增强学习：动态流量分配的新建模范式

Multi-Agent Reinforcement Learning for Markov Routing Games: A New Modeling Paradigm For Dynamic Traffic Assignment

论文作者

Shou, Zhenyu, Chen, Xu, Fu, Yongjie, Di, Xuan

论文摘要

本文旨在开发一种范式，该范式对智能代理的学习行为进行建模（包括但不限于自动驾驶汽车，连接和自动化的车辆，或具有智能导航系统的人类驱动的车辆，其中人类驾驶员完全遵循人类驾驶员的导航说明），并具有实用性的目标，并且该系统使该系统的平衡过程中的平衡过程中的途径在Atomic sermic sermic sermiss artom sermist artom sermist artom sermist artom sermist artom sermist artom arish artom arish proment驱动器。这样的范式可以帮助决策者在正常情况和异常情况下设计最佳的运营和计划对策。为此，我们开发了一个马尔可夫路由游戏（MRG），在该游戏中，每个代理商在与运输网络中与其他人进行交互时，学习并更新了自己的路线选择策略。为了有效地解决MRG，我们将其作为多代理增强学习（MARL）制定，并设计了一种平均田野多代理深Q学习（MF-MA-DQL）方法，该方法捕捉了代理之间的竞争。讨论了经典的应有范式与我们提议的马尔可夫路由游戏（MRG）之间的联系。我们表明，当使用动态加载模型（DNL）模拟交通环境时，智能代理的路由行为被证明会收敛到预测动态用户平衡的经典概念。换句话说，MRG描绘了DNL模型传播的完美信息和确定性环境的会费。解决了四个示例，以说明在没有溢出的简单网络上，在纽约市曼哈顿市哥伦比亚大学校园附近的算法和MRG均衡之间的算法效率和一致性。

This paper aims to develop a paradigm that models the learning behavior of intelligent agents (including but not limited to autonomous vehicles, connected and automated vehicles, or human-driven vehicles with intelligent navigation systems where human drivers follow the navigation instructions completely) with a utility-optimizing goal and the system's equilibrating processes in a routing game among atomic selfish agents. Such a paradigm can assist policymakers in devising optimal operational and planning countermeasures under both normal and abnormal circumstances. To this end, we develop a Markov routing game (MRG) in which each agent learns and updates her own en-route path choice policy while interacting with others in transportation networks. To efficiently solve MRG, we formulate it as multi-agent reinforcement learning (MARL) and devise a mean field multi-agent deep Q learning (MF-MA-DQL) approach that captures the competition among agents. The linkage between the classical DUE paradigm and our proposed Markov routing game (MRG) is discussed. We show that the routing behavior of intelligent agents is shown to converge to the classical notion of predictive dynamic user equilibrium (DUE) when traffic environments are simulated using dynamic loading models (DNL). In other words, the MRG depicts DUEs assuming perfect information and deterministic environments propagated by DNL models. Four examples are solved to illustrate the algorithm efficiency and consistency between DUE and the MRG equilibrium, on a simple network without and with spillback, the Ortuzar Willumsen (OW) Network, and a real-world network near Columbia University's campus in Manhattan of New York City.

下载PDF全文

下载文献需遵守相关版权规定

论文标题