Recently, deep reinforcement learning (DRL) has shown promise in solving combinatorial optimization (CO) problems. However, they often require a large number of evaluations on the objective function, which can be time-consuming in real-world scenarios. To address this issue, we propose a "free" technique to enhance the performance of any deep reinforcement learning (DRL) solver by exploiting symmetry without requiring additional objective function evaluations. Our key idea is to augment the training of DRL-based combinatorial optimization solvers by reward-preserving transformations. The proposed algorithm is likely to be impactful since it is simple, easy to integrate with existing solvers, and applicable to a wide range of combinatorial optimization tasks. Extensive empirical evaluations on NP-hard routing optimization, scheduling optimization, and de novo molecular optimization confirm that our method effortlessly improves the sample efficiency of state-of-the-art DRL algorithms. Our source code is available at https://github.com/kaist-silab/sym-rd.

提出了一种“免费”技术，通过利用对称性来增强任何基于深度强化学习（DRL）的求解器的性能，而不需要额外的目标函数评估。这种方法通过奖励保持变换来扩充DRL的训练，并且在NP硬路由优化、计划优化和革新物质优化等诸多领域得到了广泛的实证评估，展现了优异的样本效率。

组合优化中的对称探索是自由的！