BriefGPT.xyz
Dec, 2023
零和随机博弈中带有函数逼近的两时间尺度 Q-Learning
Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games
HTML
PDF
Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman
TL;DR
我们提出了一种两时间尺度Q学习算法,采用函数逼近,以找到两个玩家之间公平、收敛、理性且对称的纳什均衡。我们的方法在线性函数逼近的特殊情况下,建立了无限采样边界,从而对这类随机博弈中收敛到纳什均衡所需的样本量提供了多项式的上界。
Abstract
We consider
two-player zero-sum stochastic games
and propose a two-timescale $Q$-learning algorithm with
function approximation
that is payoff-based, convergent, rational, and symmetric between the two players. I
→