去中心化协作随机赌博机

Oct, 2018

Decentralized Cooperative Stochastic Multi-armed Bandits

David Martínez-Rubio, Varun Kanade, Patrick Rebeschini

TL;DR本文研究了多臂赌博机问题在网络上的去中心化协作，采用加速一致性过程来计算所有智能体对每个臂的平均奖励，该算法采用上置信区间来决策，能够达到更好的回归界，同时不需要过多的底层网络信息。

Abstract

We study a decentralized cooperative stochastic multi-armed bandit problem with $K$ arms on a network of $N$ agents. In our model, the reward distribution of each arm is agent-independent. Each agent chooses iter