Most combinatorial optimization problems can be formulated as mixed integer linear programming (MILP), in which branch-and-bound (B\&B) is a general and widely used method. Recently, learning to branch has become a hot research topic in the intersection of machine learning and combinatorial optimization. In this paper, we propose a novel reinforcement learning-based B\&B algorithm. Similar to offline reinforcement learning, we initially train on the demonstration data to accelerate learning massively. With the improvement of the training effect, the agent starts to interact with the environment with its learned policy gradually. It is critical to improve the performance of the algorithm by determining the mixing ratio between demonstration and self-generated data. Thus, we propose a prioritized storage mechanism to control this ratio automatically. In order to improve the robustness of the training process, a superior network is additionally introduced based on Double DQN, which always serves as a Q-network with competitive performance. We evaluate the performance of the proposed algorithm over three public research benchmarks and compare it against strong baselines, including three classical heuristics and one state-of-the-art imitation learning-based branching algorithm. The results show that the proposed algorithm achieves the best performance among compared algorithms and possesses the potential to improve B\&B algorithm performance continuously.

本文提出了基于深度强化学习的分支定界算法，该算法利用离线模仿学习与自主生成数据相结合的优化方法，并且引入了一种优先存储机制来控制二者之间的混合比例，以此提高算法的性能表现。文章在三个公共研究基准上对所提出的算法进行了评估，并与三种经典的启发式方法以及一种先进的模仿学习算法进行了比较。研究结果表明，所提出的算法在性能上表现最佳，并具有不断提高分支定界算法性能的潜力。

学习剪枝的改进强化学习算法