With the development of mobility-on-demand services, increasing sources of rich transportation data, and the advent of autonomous vehicles (AVs), there are significant opportunities for shared-use AV mobility services (SAMSs) to provide accessible and demand-responsive personal mobility. This paper focuses on the problem of anticipatory repositioning of idle vehicles in a SAMS fleet to enable better assignment decisions in serving future demand. The rebalancing problem is formulated as a Markov Decision Process and a reinforcement learning approach using an advantage actor critic (A2C) method is proposed to learn a rebalancing policy that anticipates future demand and cooperates with an optimization-based assignment strategy. The proposed formulation and solution approach allow for centralized repositioning decisions for the entire vehicle fleet but ensure that the problem size does not change with the size of the vehicle fleet. Using an agent-based simulation tool and New York City taxi data to simulate demand for rides in a SAMS system, two versions of the A2C AV repositioning approach are tested: A2C-AVR(A) observing past demand for rides and learning to anticipate future demand, and A2C-AVR(B) that receives demand forecasts. Numerical experiments demonstrate that the A2C-AVR approaches significantly reduce mean passenger wait times relative to an alternative optimization-based rebalancing approach, at the expense of slightly increased percentage of empty fleet miles travelled. The experiments show comparable performance between the A2C-AVR(A) and (B), indicating that the approach can anticipate future demand based on past demand observations. Testing with various demand and time-of-day scenarios, and an alternative assignment strategy, experiments demonstrate the models transferability to cases unseen at the training stage.

本文关注的是在共享AV机动车辆出行服务 (SAMSs) 中，空闲车辆的预测性行驶问题。将该问题建模为马尔可夫决策过程，并提出了一种使用优势 actor-critic (A2C) 方法的强化学习方法, 与基于优化的分配策略协作，学习一个预测性均衡策略。实验证明，该方法通过观察过去的需求并能够预测未来的需求，显著降低了乘客的等待时间。

共享自主出行服务的预测性车队再定位：一种基于优化和学习的方法