BriefGPT.xyz
Jun, 2019
通用动作空间中的平衡离线评估
Balanced Off-Policy Evaluation General Action Spaces
HTML
PDF
Arjun Sondhi, David Arbour, Drew Dimmery
TL;DR
提出了平衡离线策略评估(B-OPE)的通用方法,通过将估计权重的风险最小化,减小了平衡不匹配的问题,其二分类解决方案可适用于所有操作类型,并且易于超参数调整,实验证明其在离线策略评估中得到应用。
Abstract
In many practical applications of
contextual bandits
, online learning is infeasible and practitioners must rely on
off-policy evaluation
(OPE) of logged data collected from prior policies. OPE generally consists
→