BriefGPT.xyz
Jun, 2019
基于边际化重要性采样的强化学习最优离线评估
Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
HTML
PDF
Tengyang Xie, Yifei Ma, Yu-Xiang Wang
TL;DR
本研究提出了一种基于较小方差的边缘重要性抽样(MIS)的算法,用以解决RL中long horizon MDP的Off-policy evaluation(OPE)问题,并表现出在多个环境中的良好表现。
Abstract
Motivated by the many real-world applications of
reinforcement learning
(RL) that require safe-policy iterations, we consider the problem of
off-policy evaluation
(OPE) --- the problem of evaluating a new policy
→