BriefGPT.xyz
Nov, 2023
无噪音奖励和无通信的最佳合作多人学习赌博机
Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication
HTML
PDF
William Chang, Yuanhao Lu
TL;DR
合作多人奖励学习中,通信受限的策略选择问题;通过使用上界和下界置信度算法,解决信息不对称导致的动作选择问题,并达到对数和平方根极限遗憾值。
Abstract
We consider a
cooperative multiplayer bandit learning
problem where the players are only allowed to agree on a
strategy
beforehand, but cannot communicate during the learning process. In this problem, each player
→