BriefGPT.xyz
Jun, 2024
正向和反向状态抽象用于策略离线评估
Forward and Backward State Abstractions for Off-policy Evaluation
HTML
PDF
Meiling Hao, Pingfan Su, Liyuan Hu, Zoltan Szabo, Qingyuan Zhao...
TL;DR
本研究旨在通过使用状态抽象来对关联性评估进行有效的离线算法评估,并通过构建基于观察到的MDP的时间反转MDP导出Q函数和边缘化重要性采样比率的充分条件,进而提出一种新颖的两步骤程序,将原始状态空间顺序投影到较小的空间,从而大大简化高基数引起的关联性评估的样本复杂度。
Abstract
off-policy evaluation
(OPE) is crucial for evaluating a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging.This paper studies
state abstractio
→