在现实世界中的多模式城市交通控制方面评估奖励功能的奖励功能

论文标题

在现实世界中的多模式城市交通控制方面评估奖励功能的奖励功能

Assessment of Reward Functions in Reinforcement Learning for Multi-Modal Urban Traffic Control under Real-World limitations

论文作者

Cabrejas-Egea, Alvaro, Connaughton, Colm

论文摘要

强化学习证明了一种成功的工具，可以通过策划传统交通控制器所需的努力来管理城市交叉点。但是，关于行人对此类交叉路口的引入和控制的文献很少。此外，尚不清楚应将哪些交通状态变量用作获得最佳代理性能的奖励。本文可靠地评估30种不同的加强学习奖励功能，以控制服务的交叉路口，为行人和车辆提供涵盖通过现代基于基于视觉的传感器提供的主要交通状态变量。以前的文献中仅针对车辆交通的一些奖励扩展到行人，而引入了新的行人。我们在英国大曼彻斯特的真正交叉路口的需求，传感器，绿色时代和其他操作约束方面使用了校准模型。评估的奖励可以根据所使用的幅度分为5组：排队，等待时间，延迟，平均速度和吞吐量。从正常操作到传统自适应控制器饱和的不同需求水平，在等待时间方面的性能进行了比较。我们发现，那些最大化网络速度的奖励可以同时获得车辆和行人的最低等待时间，紧随其后的是排队最小化，表现出比其他先前建议的方法更好的性能。

Reinforcement Learning is proving a successful tool that can manage urban intersections with a fraction of the effort required to curate traditional traffic controllers. However, literature on the introduction and control of pedestrians to such intersections is scarce. Furthermore, it is unclear what traffic state variables should be used as reward to obtain the best agent performance. This paper robustly evaluates 30 different Reinforcement Learning reward functions for controlling intersections serving pedestrians and vehicles covering the main traffic state variables available via modern vision-based sensors. Some rewards proposed in previous literature solely for vehicular traffic are extended to pedestrians while new ones are introduced. We use a calibrated model in terms of demand, sensors, green times and other operational constraints of a real intersection in Greater Manchester, UK. The assessed rewards can be classified in 5 groups depending on the magnitudes used: queues, waiting time, delay, average speed and throughput in the junction. The performance of different agents, in terms of waiting time, is compared across different demand levels, from normal operation to saturation of traditional adaptive controllers. We find that those rewards maximising the speed of the network obtain the lowest waiting time for vehicles and pedestrians simultaneously, closely followed by queue minimisation, demonstrating better performance than other previously proposed methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题