通过增强的Lagrangian用于电动汽车的深入增强学习调度方法

论文标题

通过增强的Lagrangian用于电动汽车的深入增强学习调度方法

A Deep Reinforcement Learning-Based Charging Scheduling Approach with Augmented Lagrangian for Electric Vehicle

论文作者

Chen, Guibin., Shi, Xiaoying.

论文摘要

本文解决了当参与需求响应（DR）时优化电动汽车（EV）的充电/排放时间表的问题。由于电动汽车的剩余能量，到达和出发时间以及未来的电价中存在不确定性，因此很难做出充电决定以最大程度地减少充电成本，同时确保电动汽车电池的最新电荷（SOC）在一定范围内。为了解决这一难题，本文将EV充电调度问题提出为马尔可夫决策过程（CMDP）。通过协同结合增强的拉格朗日方法和软演员评论家算法，本文提出了一种新型安全的非政策钢筋学习方法（RL）方法来解决CMDP。通过Lagrangian值函数以策略梯度方式更新Actor网络。采用双危危机网络来同步估计动作值函数，以避免高估偏差。所提出的算法不需要强大的凸度保证，可以保证被检查的问题，并且是有效的样本。现实世界中电价的全面数值实验表明，我们提出的算法可以实现高解决方案最佳性和约束依从性。

This paper addresses the problem of optimizing charging/discharging schedules of electric vehicles (EVs) when participate in demand response (DR). As there exist uncertainties in EVs' remaining energy, arrival and departure time, and future electricity prices, it is quite difficult to make charging decisions to minimize charging cost while guarantee that the EV's battery state-of-the-charge (SOC) is within certain range. To handle with this dilemma, this paper formulates the EV charging scheduling problem as a constrained Markov decision process (CMDP). By synergistically combining the augmented Lagrangian method and soft actor critic algorithm, a novel safe off-policy reinforcement learning (RL) approach is proposed in this paper to solve the CMDP. The actor network is updated in a policy gradient manner with the Lagrangian value function. A double-critics network is adopted to synchronously estimate the action-value function to avoid overestimation bias. The proposed algorithm does not require strong convexity guarantee of examined problems and is sample efficient. Comprehensive numerical experiments with real-world electricity price demonstrate that our proposed algorithm can achieve high solution optimality and constraints compliance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题