凹函数$N$人博弈中的赌徒学习

Oct, 2018

Bandit learning in concave $N$-person games

Mario Bravo, David S. Leslie, Panayotis Mertikopoulos

TL;DR研究了非协同凹性博弈中以赌徒反馈为学习手段的长期行为，证明了采用镜像下降算法的不懊悔学习算法在满足标准单调性条件下能以概率1收敛于Nash均衡，并推导出了其收敛速率的上界。

Abstract

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even