离线评估的状态相关性

Sep, 2021

State Relevance for Off-Policy Evaluation

Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez

TL;DR本文提出了OSIRIS方法，它可以通过删除某些状态的可能性比率来降低重要性抽样估计器的方差，从而使其具有更高的效率和相对较少的假设。

Abstract

importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often h