We study a novel variant of the multi-armed bandit problem, where at each
time step, the player observes an independently sampled context that determines
the arms' mean rewards. However, playing an arm blocks it (across all contexts)
for a fixed and known number of future time steps. T