AbstractWe consider a
decentralized stochastic multi-armed bandit problem with multiple players. Each player aims to maximize his/her own reward by pulling an arm. The arms give rewards based on i.i.d. stochastic Bernoulli distributions. Players are not aware about the
→