离线政策评估中的行为策略估计：校准很重要

Jul, 2018

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal...

TL;DR探讨在行为策略未知的情况下，使用离线策略评估(OPE)来估计行为策略的问题。通过一系列实证研究，我们展示了精度如何取决于行为策略模型的校准，并展示了如何使用简单、非参数的k最近邻模型来获得更好的校准，并可用于优秀的基于重要性采样的OPE估计。

Abstract

In this work, we consider the problem of estimating a behaviour policy for use in off-policy policy evaluation (OPE) when the true behaviour poli