The multi-armed bandit (MAB) problem is an active learning framework that
aims to select the best among a set of actions by sequentially observing
rewards. Recently, it has become popular for a number of applications over
wireless networks, where communication constraints can form a bo