BriefGPT.xyz
Jun, 2024
AutoOPE:自动离策择估计器选择
AutoOPE: Automated Off-Policy Estimator Selection
HTML
PDF
Nicolò Felicioni, Michael Benigni, Maurizio Ferrari Dacrema
TL;DR
自动数据驱动的离策评估估计器选择方法,基于机器学习模型在合成任务中预测最佳估计器,能够在多个真实世界数据集上选择更好的估计器并显著降低计算成本。
Abstract
The
off-policy evaluation
(OPE) problem consists of evaluating the performance of
counterfactual policies
with data collected by another one. This problem is of utmost importance for various application domains,
→