off-policy learning plays a pivotal role in optimizing and evaluating
policies prior to the online deployment. However, during the real-time serving,
we observe varieties of interventions and constraints that cause inconsistency
between the online and offline settings, which we summari