BriefGPT.xyz
Dec, 2019
通过规则化的定向学习实现更高效的离线策略评估
More Efficient Off-Policy Evaluation through Regularized Targeted Learning
HTML
PDF
Aurélien F. Bibaut, Ivana Malenica, Nikos Vlassis, Mark J. van der Laan
TL;DR
本文介绍了基于因果推断的目标最大似然估计原理所提出的新型双重稳健的评估方法和多种方差减少技术,能够在多种强化学习环境和各种模型规范级别下比现有评估方法都能表现出更好的性能
Abstract
We study the problem of
off-policy evaluation
(OPE) in
reinforcement learning
(RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different p
→