Unlike traditional supervised learning, in many settings only partial
feedback is available. We may only observe outcomes for the chosen actions, but
not the counterfactual outcomes associated with other alternatives. Such
settings encompass a wide variety of applications including pri