在信号交叉点学习混合电气排控制的政策：随机搜索方法

论文标题

在信号交叉点学习混合电气排控制的政策：随机搜索方法

Learning the policy for mixed electric platoon control of automated and human-driven vehicles at signalized intersection: a random search approach

论文作者

Jiang, Xia, Zhang, Jian, Shi, Xiaoyu, Cheng, Jian

论文摘要

在过去的几十年中，车辆的升级和更新加速了。出于对环境友好和情报的需求，电动汽车（EV）以及连接和自动化的车辆（CAVS）已成为运输系统的新组成部分。本文开发了一个增强学习框架，以在信号交叉点上对由骑士和人类驱动车辆（HDV）组成的电力排实施自适应控制。首先，提出了马尔可夫决策过程（MDP）模型来描述混合排的决策过程。新颖的状态表示和奖励功能是为模型设计的，以考虑整个排的行为。其次，为了处理延迟的奖励，提出了增强的随机搜索（ARS）算法。代理商学到的控制政策可以指导骑士的纵向运动，后者是排的领导者。最后，在模拟套件相扑中进行了一系列模拟。与几种最先进的（SOTA）强化学习方法相比，提出的方法可以获得更高的奖励。同时，仿真结果证明了延迟奖励的有效性，延迟奖励的有效性与正常的CAR跟随行为相比，其旨在超越分布式奖励机制，灵敏度分析表明，通过调整优化目标的相对重要性，可以将能量保存到不同的扩展（39.27％-82.51％）。在没有牺牲行进延迟的前提下，提出的控制方法可以节省高达53.64％的电能。

The upgrading and updating of vehicles have accelerated in the past decades. Out of the need for environmental friendliness and intelligence, electric vehicles (EVs) and connected and automated vehicles (CAVs) have become new components of transportation systems. This paper develops a reinforcement learning framework to implement adaptive control for an electric platoon composed of CAVs and human-driven vehicles (HDVs) at a signalized intersection. Firstly, a Markov Decision Process (MDP) model is proposed to describe the decision process of the mixed platoon. Novel state representation and reward function are designed for the model to consider the behavior of the whole platoon. Secondly, in order to deal with the delayed reward, an Augmented Random Search (ARS) algorithm is proposed. The control policy learned by the agent can guide the longitudinal motion of the CAV, which serves as the leader of the platoon. Finally, a series of simulations are carried out in simulation suite SUMO. Compared with several state-of-the-art (SOTA) reinforcement learning approaches, the proposed method can obtain a higher reward. Meanwhile, the simulation results demonstrate the effectiveness of the delay reward, which is designed to outperform distributed reward mechanism} Compared with normal car-following behavior, the sensitivity analysis reveals that the energy can be saved to different extends (39.27%-82.51%) by adjusting the relative importance of the optimization goal. On the premise that travel delay is not sacrificed, the proposed control method can save up to 53.64% electric energy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题