Designing efficient general-purpose contextual bandit algorithms that work
with large -- or even continuous -- action spaces would facilitate application
to important scenarios such as information retrieval, recommendation systems,
and continuous control. While obtaining standard regre