This manuscript introduces the idea of using Distributionally Robust Optimization (DRO) for the Counterfactual Risk Minimization (CRM) problem. Tapping into a rich existing literature, we show that DRO is a principled tool for counterfactual decision making. We also show that well-established solutions to the CRM problem like sample variance penalization schemes are special instances of a more general DRO problem. In this unifying framework, a variety of distributionally robust counterfactual risk estimators can be constructed using various probability distances and divergences as uncertainty measures. We propose the use of Kullback-Leibler divergence as an alternative way to model uncertainty in CRM and derive a new robust counterfactual objective. In our experiments, we show that this approach outperforms the state-of-the-art on four benchmark datasets, validating the relevance of using other uncertainty measures in practical applications.

本文介绍了使用分布式鲁棒优化(DRO)解决交叉事实风险最小化(CRM)问题的想法，并证明了DRO是对策反决策的一种有原则的工具。我们提出了使用Kullback-Leibler马氏距离作为CRM中不确定性的代替方法，并基于这一方法提出了一种新的鲁棒对策反目标。通过实验证明，在实践中使用其他不确定性度量具有重要意义。

分布式稳健的反事实风险最小化