Most solutions to the inventory management problem assume a centralization of information that is incompatible with organisational constraints in real supply chain networks. The inventory management problem is a well-known planning problem in operations research, concerned with finding the optimal re-order policy for nodes in a supply chain. While many centralized solutions to the problem exist, they are not applicable to real-world supply chains made up of independent entities. The problem can however be naturally decomposed into sub-problems, each associated with an independent entity, turning it into a multi-agent system. Therefore, a decentralized data-driven solution to inventory management problems using multi-agent reinforcement learning is proposed where each entity is controlled by an agent. Three multi-agent variations of the proximal policy optimization algorithm are investigated through simulations of different supply chain networks and levels of uncertainty. The centralized training decentralized execution framework is deployed, which relies on offline centralization during simulation-based policy identification, but enables decentralization when the policies are deployed online to the real system. Results show that using multi-agent proximal policy optimization with a centralized critic leads to performance very close to that of a centralized data-driven solution and outperforms a distributed model-based solution in most cases while respecting the information constraints of the system.

提出了一种使用多智能体强化学习的分散化数据驱动库存管理问题的解决方案，其中每个实体由一个智能体控制，通过对不同供应链网络和不确定性水平的模拟来研究近端策略优化算法的三个多智能体变体。中心化训练分散化执行的框架被部署，该框架依赖于离线集中化，以便在基于模拟的策略识别期间进行，但在策略在线部署到实际系统时实现分散化。结果表明，使用带有集中式评论者的多智能体近端策略优化方法可以实现接近集中式数据驱动解决方案的性能，并在大多数情况下优于分布式基于模型的解决方案，同时遵守系统的信息约束。

多智能体强化学习在分散式库存控制系统中的分析