Learned construction heuristics for scheduling problems have become
increasingly competitive with established solvers and heuristics in recent
years. In particular, significant improvements have been observed in solution
approaches using deep reinforcement learning (DRL). While much attention has
been paid to the design of network architectures and training algorithms to
achieve state-of-the-art results, little research has investigated the optimal
use of trained DRL agents during inference. Our work is based on the hypothesis
that, similar to search algorithms, the utilization of trained DRL agents
should be dependent on the acceptable computational budget. We propose a simple
yet effective parameterization, called $\delta$-sampling that manipulates the
trained action vector to bias agent behavior towards exploration or
exploitation during solution construction. By following this approach, we can
achieve a more comprehensive coverage of the search space while still
generating an acceptable number of solutions. In addition, we propose an
algorithm for obtaining the optimal parameterization for such a given number of
solutions and any given trained agent. Experiments extending existing training
protocols for job shop scheduling problems with our inference method validate
our hypothesis and result in the expected improvements of the generated
solutions.

利用经过训练的深度强化学习智能体进行推理的优化参数化方法，该方法通过调整训练好的行为向量，使智能体在解决方案构建过程中更好地探索或开发，进而在有限的计算预算情况下生成更多可接受的解决方案。

超越训练：通过自适应动作采样优化基于强化学习的工作车间调度

Beyond Training: Optimizing Reinforcement Learning Based Job Shop  Scheduling Through Adaptive Action Sampling

Job-shop scheduling problem (JSP) is a mathematical optimization problem
widely used in industries like manufacturing, and flexible JSP (FJSP) is also a
common variant. Since they are NP-hard, it is intractable to find the optimal
solution for all cases within reasonable times. Thus, it becomes important to
develop efficient heuristics to solve JSP/FJSP. A kind of method of solving
scheduling problems is construction heuristics, which constructs scheduling
solutions via heuristics. Recently, many methods for construction heuristics
leverage deep reinforcement learning (DRL) with graph neural networks (GNN). In
this paper, we propose a new approach, named residual scheduling, to solving
JSP/FJSP. In this new approach, we remove irrelevant machines and jobs such as
those finished, such that the states include the remaining (or relevant)
machines and jobs only. Our experiments show that our approach reaches
state-of-the-art (SOTA) among all known construction heuristics on most
well-known open JSP and FJSP benchmarks. In addition, we also observe that even
though our model is trained for scheduling problems of smaller sizes, our
method still performs well for scheduling problems of large sizes.
Interestingly in our experiments, our approach even reaches zero gap for 49
among 50 JSP instances whose job numbers are more than 150 on 20 machines.

本论文提出了一种名为 “残余调度” 的新方法，用于解决作业车间调度问题和灵活作业车间调度问题。实验证明，该方法在大多数著名的开放式作业车间调度问题和灵活作业车间调度问题基准测试中达到了最先进的水平。此外，研究还观察到，尽管该模型是针对较小规模的调度问题进行训练的，但在大规模调度问题上仍然表现良好。有趣的是，在实验中，该方法甚至在 20 台机器上有 150 个以上作业的 50 个作业车间调度实例中，有 49 个实例达到了零间隔。

残余调度：解决工作车间调度问题的新强化学习方法

Residual Scheduling: A New Reinforcement Learning Approach to Solving  Job Shop Scheduling Problem

We present a novel deep reinforcement learning method to learn construction
heuristics for vehicle routing problems. In specific, we propose a
Multi-Decoder Attention Model (MDAM) to train multiple diverse policies, which
effectively increases the chance of finding good solutions compared with
existing methods that train only one policy. A customized beam search strategy
is designed to fully exploit the diversity of MDAM. In addition, we propose an
Embedding Glimpse layer in MDAM based on the recursive nature of construction,
which can improve the quality of each policy by providing more informative
embeddings. Extensive experiments on six different routing problems show that
our method significantly outperforms the state-of-the-art deep learning based
models.

该研究提出了一种新的深度强化学习方法，用于学习车辆路径问题的构建启发式算法。实验结果表明，该方法显著优于现有的基于深度学习的模型。