In this paper, we introduce novel gradient-based optimization methods for
state-based potential games (SbPGs) within self-learning distributed production
systems. SbPGs are recognised for their efficacy in enabling self-optimizing
distributed multi-agent systems and offer a proven convergence guarantee, which
facilitates collaborative player efforts towards global objectives. Our study
strives to replace conventional ad-hoc random exploration-based learning in
SbPGs with contemporary gradient-based approaches, which aim for faster
convergence and smoother exploration dynamics, thereby shortening training
duration while upholding the efficacy of SbPGs. Moreover, we propose three
distinct variants for estimating the objective function of gradient-based
learning, each developed to suit the unique characteristics of the systems
under consideration. To validate our methodology, we apply it to a laboratory
testbed, namely Bulk Good Laboratory Plant, which represents a smart and
flexible distributed multi-agent production system. The incorporation of
gradient-based learning in SbPGs reduces training times and achieves more
optimal policies than its baseline.

在这篇文章中，我们介绍了一种面向自学习分布式生产系统的基于梯度的优化方法，该方法用于状态基潜在博弈（SbPGs）。我们的研究旨在将传统的基于随机探索的学习方法替换为现代的基于梯度的方法，以实现更快的收敛和更平滑的探索动力学，从而缩短训练时间并保持 SbPGs 的有效性。此外，我们提出了三种不同的变体来估计基于梯度学习的目标函数，每种变体都针对所考虑系统的独特特征进行了开发。通过将基于梯度的学习应用于实验室测试平台 ——Bulk Good Laboratory Plant，我们验证了我们的方法，该平台代表了一个智能灵活的分布式多智能体生产系统。将基于梯度的学习纳入 SbPGs 中，可以缩短训练时间并获得更优化的策略，优于基线模型。

基于梯度的状态潜力博弈在自学生产系统中的学习

Gradient-based Learning in State-based Potential Games for Self-Learning  Production Systems

While end-to-end learning with fully differentiable models has enabled
tremendous success in natural language process (NLP) and machine learning,
there have been significant recent interests in learning with latent discrete
structures to incorporate better inductive biases for improved end-task
performance and better interpretability. This paradigm, however, is not
straightforwardly amenable to the mainstream gradient-based optimization
methods. This work surveys three main families of methods to learn such models:
surrogate gradients, continuous relaxation, and marginal likelihood
maximization via sampling. We conclude with a review of applications of these
methods and an inspection of the learned latent structure that they induce.

该论文介绍了在自然语言处理和机器学习中，为了提高模型性能和解释性，学习离散结构的方法。论文提出了三种主要的方法：代理梯度、连续松弛和基于采样的边缘似然最大化，最终总结了这些方法的应用和学习的潜在结构的检查。