We formulate and study a decentralized multi-armed bandit (MAB) problem.
There are M distributed players competing for N independent arms. Each arm,
when played, offers i.i.d. reward according to a distribution with an unknown
parameter. At each time, each player chooses one arm to pla