Recent works on ride-sharing order dispatching have highlighted the importance of taking into account both the spatial and temporal dynamics in the dispatching process for improving the transportation system efficiency. At the same time, deep reinforcement learning has advanced to the point where it achieves superhuman performance in a number of fields. In this work, we propose a deep reinforcement learning based solution for order dispatching and we conduct large scale online A/B tests on DiDi's ride-dispatching platform to show that the proposed method achieves significant improvement on both total driver income and user experience related metrics. In particular, we model the ride dispatching problem as a Semi Markov Decision Process to account for the temporal aspect of the dispatching actions. To improve the stability of the value iteration with nonlinear function approximators like neural networks, we propose Cerebellar Value Networks (CVNet) with a novel distributed state representation layer. We further derive a regularized policy evaluation scheme for CVNet that penalizes large Lipschitz constant of the value network for additional robustness against adversarial perturbation and noises. Finally, we adapt various transfer learning methods to CVNet for increased learning adaptability and efficiency across multiple cities. We conduct extensive offline simulations based on real dispatching data as well as online AB tests through the DiDi's platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods. We finally show that the performance can be further improved through the efficient use of transfer learning.

本文提出了基于深度强化学习的骑乘共享订单分配方案，其中模拟骑乘分配问题为半马尔可夫决策过程，并使用分布式状态表示层设计Cerebellar Value Networks(CVNet)以提高非线性函数逼近器(如神经网络)的值迭代的稳定性；最后，通过AB测试和离线模拟验证，在优化司机的总收入和提高用户体验方面，CVNet相对于其他分配方法具有一定的优势，而经过有效的迁移学习后，CVNet的性能进一步得到了提高。

基于深度价值网络的多司机订单调度方法