Electric vehicle (EV) charging stations represent a substantial load with significant flexibility. The exploitation of that flexibility in demand response (DR) algorithms becomes increasingly important to manage and balance demand and supply in power grids. Model-free DR based on reinforcement learning (RL) is an attractive approach to balance such EV charging load. We build on previous research on RL, based on a Markov decision process (MDP) to simultaneously coordinate multiple charging stations. However, we note that the computationally expensive cost function adopted in the previous research leads to large training times, which limits the feasibility and practicality of the approach. We, therefore, propose an improved cost function that essentially forces the learned control policy to always fulfill any charging demand that does not offer any flexibility. We rigorously compare the newly proposed batch RL fitted Q-iteration implementation with the original (costly) one, using real-world data. Specifically, for the case of load flattening, we compare the two approaches in terms of (i) the processing time to learn the RL-based charging policy, as well as (ii) the overall performance of the policy decisions in terms of meeting the target load for unseen test data. The performance is analyzed for different training periods and varying training sample sizes. In addition to both RL policies performance results, we provide performance bounds in terms of both (i) an optimal all-knowing strategy, and (ii) a simple heuristic spreading individual EV charging uniformly over time

本研究基于强化学习模型，使用马尔可夫决策过程来协调多个充电站，通过提出改进的成本函数使得学到的控制策略始终满足不具备任何灵活性的充电需求，同时使用真实世界数据对比了两种方法在负荷平整的情况下学习强化策略的性能和策略决策的总体性能。

使用强化学习进行多个电动汽车充电站需求响应协调的优化成本函数