Standard contextual bandit problem assumes that all the relevant contexts are
observed before the algorithm chooses an arm. This modeling paradigm, while
useful, often falls short when dealing with problems in which valuable
additional context can be observed after arm selection. For e