Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

本论文使用基于深度加强学习方法的策略梯度算法，通过2-opt操作符学习本地搜索启发式，提出了一种可以轻松扩展到更一般k-opt移动的策略神经网络，实验结果表明，所学习的策略比之前的具有最先进性能的深度学习方法更快接近最优解。

通过深度强化学习学习旅行商问题的2-opt启发式算法