The vehicle routing problem is a well known class of NP-hard combinatorial optimisation problems in literature. Traditional solution methods involve either carefully designed heuristics, or time-consuming metaheuristics. Recent work in reinforcement learning has been a promising alternative approach, but has found it difficult to compete with traditional methods in terms of solution quality. This paper proposes a hybrid approach that combines reinforcement learning, policy rollouts, and a satisfiability solver to enable a tunable tradeoff between computation times and solution quality. Results on a popular public data set show that the algorithm is able to produce solutions closer to optimal levels than existing learning based approaches, and with shorter computation times than meta-heuristics. The approach requires minimal design effort and is able to solve unseen problems of arbitrary scale without additional training. Furthermore, the methodology is generalisable to other combinatorial optimisation problems.

本文提出一种混合方法，将强化学习、策略推进和可满足性求解器相结合，以实现计算时间和解决方案质量之间的可调节权衡，该方法可以解决任意规模的问题，且无需额外训练，在解决车辆路由问题中的效果优于现有的基于学习的方法和元启发式算法，更具有泛化性。

使用Rollouts和MAX-SAT解决带时间窗口的有容量车辆路径问题