BriefGPT.xyz
Oct, 2018
凹函数$N$人博弈中的赌徒学习
Bandit learning in concave $N$-person games
HTML
PDF
Mario Bravo, David S. Leslie, Panayotis Mertikopoulos
TL;DR
研究了非协同凹性博弈中以赌徒反馈为学习手段的长期行为,证明了采用镜像下降算法的不懊悔学习算法在满足标准单调性条件下能以概率1收敛于Nash均衡,并推导出了其收敛速率的上界。
Abstract
This paper examines the long-run behavior of
learning
with
bandit feedback
in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even
→