BriefGPT.xyz
Sep, 2021
离线评估的状态相关性
State Relevance for Off-Policy Evaluation
HTML
PDF
Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez
TL;DR
本文提出了OSIRIS方法,它可以通过删除某些状态的可能性比率来降低重要性抽样估计器的方差,从而使其具有更高的效率和相对较少的假设。
Abstract
importance sampling
-based estimators for
off-policy evaluation
(OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often h
→