从重要性采样到双重稳健策略梯度

Oct, 2019

从重要性采样到双重稳健策略梯度

From Importance Sampling to Doubly Robust Policy Gradient

Jiawei Huang, Nan Jiang

TL;DR通过重要性抽样的估计器取有限差分式，得出了基于策略梯度的有限差分及其方差的算法，提供了一种非常通用而灵活的双重稳健策略梯度估计器，并分析了其方差、与现有估计器的比较及其效果。

Abstract

We show that policy gradient (PG) and its variance reduction variants can be derived by taking finite difference of function evaluations supplied by estimators from the importance sampling (IS) family for