BriefGPT.xyz
May, 2024
随机斯塔克贝格博弈中的帕累托最优策略的政策迭代
Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg Games
HTML
PDF
Mikoto Kudo, Yohei Akimoto
TL;DR
在广义随机博弈中,引入了Pareto最优概念作为可替代的平衡点,提出了基于最优反应的随机博弈的政策改进定理,并提出了一种迭代算法来确定Pareto最优策略,证明了该方法的单调改进性和收敛性,以及在特殊情况下收敛到平衡点的性质。
Abstract
In general-sum
stochastic games
, a
stationary stackelberg equilibrium
(SSE) does not always exist, in which the leader maximizes leader's return for all the initial states when the follower takes the best respons
→