We address the problem of learning in an online setting where the learner
repeatedly observes features, selects among a set of actions, and receives
reward for the action taken. We provide the first efficient algorithm with an
optimal regret. Our algorithm uses a cost sensitive classification learner as
an oracle and has a running time $\mathrm{polylog}(N)$,