BriefGPT.xyz
Jan, 2023
用于脱机策略评估的变分潜在分支模型
Variational Latent Branching Model for Off-Policy Evaluation
HTML
PDF
Qitong Gao, Ge Gao, Min Chi, Miroslav Pajic
TL;DR
本篇论文探讨了利用变分潜在分支模型(VLBM)学习(行动)决策过程的转移函数,并通过轨迹模拟评估其性能,表明VLBM优于现有OPE方法。
Abstract
Model-based methods have recently shown great potential for
off-policy evaluation
(OPE); offline trajectories induced by behavioral policies are fitted to transitions of Markov decision processes (
mdps
), which ar
→