BriefGPT.xyz
Oct, 2019
从重要性采样到双重稳健策略梯度
From Importance Sampling to Doubly Robust Policy Gradient
HTML
PDF
Jiawei Huang, Nan Jiang
TL;DR
通过重要性抽样的估计器取有限差分式,得出了基于策略梯度的有限差分及其方差的算法,提供了一种非常通用而灵活的双重稳健策略梯度估计器,并分析了其方差、与现有估计器的比较及其效果。
Abstract
We show that policy gradient (PG) and its
variance reduction
variants can be derived by taking finite difference of function evaluations supplied by estimators from the
importance sampling
(IS) family for
→