BriefGPT.xyz
Oct, 2019
适用于非政策评估的极大极小权重和Q函数学习
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
HTML
PDF
Masatoshi Uehara, Nan Jiang
TL;DR
本文探讨了强化学习中的离线评估问题,提出了两种新的重要比率估计器,并给出了样本复杂度分析和渐进优化等结果。
Abstract
We provide theoretical investigations into
off-policy evaluation
in
reinforcement learning
using function approximators for (marginalized)
import
→