BriefGPT.xyz
May, 2019
将参数化和非参数化模型相结合的离线策略估计
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
HTML
PDF
Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez
TL;DR
通过结合参数化模型和非参数化模型的混合专家方法来评估强化学习中的批次离线策略,通过选择每个时间步中的模型来最小化回报误差估计,我们的方法在多个领域中优于单个模型和基于重要性采样的状态艺术评估。
Abstract
We consider a model-based approach to perform
batch off-policy evaluation
in
reinforcement learning
. Our method takes a
mixture-of-experts
→