BriefGPT.xyz
Dec, 2022
通过状态抽象将边缘重要抽样扩展到高维状态空间
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction
HTML
PDF
Brahma S. Pavse, Josiah P. Hanna
TL;DR
本研究提出了一种基于状态抽象的离线策略评估方法,采用较低维的状态空间可以降低重要性采样中方差的影响,提高评估准确性和鲁棒性。
Abstract
We consider the problem of
off-policy evaluation
(OPE) in
reinforcement learning
(RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collect
→