We study a variant of the classical stochastic $K$-armed bandit where
observing the outcome of each arm is expensive, but cheap approximations to
this outcome are available. For example, in online advertising the performance
of an ad can be approximated by displaying it for shorter tim