Combinatorial optimization problems require an exhaustive search to find the
optimal solution. A convenient approach to solving combinatorial optimization
tasks in the form of Mixed Integer Linear Programs is Branch-and-Bound.
Branch-and-Bound solver splits a task into two parts dividing the domain of an
integer variable, then it solves them recursively, producing a tree of nested
sub-tasks. The efficiency of the solver depends on the branchning heuristic
used to select a variable for splitting. In the present work, we propose a
reinforcement learning method that can efficiently learn the branching
heuristic. We view the variable selection task as a tree Markov Decision
Process, prove that the Bellman operator adapted for the tree Markov Decision
Process is contracting in mean, and propose a modified learning objective for
the reinforcement learning agent. Our agent requires less training data and
produces smaller trees compared to previous reinforcement learning methods.

本文介绍了一种基于强化学习的组合优化问题解决方法，使用树马尔可夫决策过程视角选择变量分支，并且优化学习对象，相较之前的强化学习方法，训练数据量更小，生成树的大小更小。