BriefGPT.xyz
Jun, 2020
通过自标准化的重要性权重实现自信的离线评估和选择
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting
HTML
PDF
Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári
TL;DR
该研究考虑在上下文匹配机器学习算法的偏离策略评估中,提出了一种新的方法——基于自归一化重要性权重估算目标策略的价值下界,并在合成和实际数据集上测试表明该方法可获得更优越的策略,包括更紧密的置信区间和选择的质量。
Abstract
We consider
off-policy evaluation
in the
contextual bandit
setting for the purpose of obtaining a robust off-policy
selection strategy
, wh
→