BriefGPT.xyz
Feb, 2018
更加健壮的双重偏差离线评估
More Robust Doubly Robust Off-policy Evaluation
HTML
PDF
Mehrdad Farajtabar, Yinlam Chow, Mohammad Ghavamzadeh
TL;DR
本文针对强化学习中的离策略评估问题,提出了一种名为MRDR的更加鲁棒的Doubly Robust 估计方法,该方法通过最小化DR估计器的方差来学习模型参数,并在上下文决策和强化学习基准问题中进行评估,证明了其强一致性和渐进最优性。
Abstract
We study the problem of
off-policy evaluation
(OPE) in
reinforcement learning
(RL), where the goal is to estimate the performance of a policy from the data generated by another policy(ies). In particular, we focu
→