Scheduling problems pose significant challenges in resource, industry, and operational management. This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach. The study introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms. The experiments employ various deep neural network policies for single- and Multi-Agent approaches. Results demonstrate the efficacy of the Maskable extension of the Proximal Policy Optimization (PPO) algorithm in Single-Agent scenarios and the Multi-Agent PPO algorithm in Multi-Agent setups. While Single-Agent algorithms perform adequately in reduced scenarios, Multi-Agent approaches reveal challenges in cooperative learning but a scalable capacity. This research contributes insights into applying MARL techniques to scheduling optimization, emphasizing the need for algorithmic sophistication balanced with scalability for intelligent scheduling solutions.

本研究解决了涉及设定时间和资源的不相关并行机器调度问题，采用多智能体强化学习（MARL）方法。通过实证分析，比较了MARL与单智能体算法的表现，发现多智能体PPO算法在多智能体环境中显示出可扩展性，尽管在协作学习上面临挑战。这项研究为调度优化的MARL技术应用提供了新视角，强调了算法复杂性与可扩展性之间的平衡。

探索多智能体强化学习在不相关并行机器调度中的应用