BriefGPT.xyz
Oct, 2018
去中心化协作随机赌博机
Decentralized Cooperative Stochastic Multi-armed Bandits
HTML
PDF
David Martínez-Rubio, Varun Kanade, Patrick Rebeschini
TL;DR
本文研究了多臂赌博机问题在网络上的去中心化协作,采用加速一致性过程来计算所有智能体对每个臂的平均奖励,该算法采用上置信区间来决策,能够达到更好的回归界,同时不需要过多的底层网络信息。
Abstract
We study a decentralized cooperative
stochastic multi-armed bandit
problem with $K$ arms on a
network
of $N$ agents. In our model, the reward distribution of each arm is agent-independent. Each agent chooses iter
→