Stochastic variance-reduced gradient (SVRG) is an optimization method originally designed for tackling machine learning problems with a finite sum structure. SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy. SVRG makes use of gradient estimates at two scales. At the slower scale, SVRG computes a full gradient over the whole dataset, which could lead to prohibitive computation costs. In this work, we show that two variants of SVRG for policy evaluation could significantly diminish the number of gradient calculations while preserving a linear convergence speed. More importantly, our theoretical result implies that one does not need to use the entire dataset in every epoch of SVRG when it is applied to policy evaluation with linear function approximation. Our experiments demonstrate large computational savings provided by the proposed methods.

本文提出了 Stochastic Variance-Reduced Gradient 方法的两种变体应用于 Policy Evaluation，可以显著减少梯度计算次数，同时保持线性收敛速度，理论分析表明这些方法不需要在每次迭代中使用整个数据集，仅需用于线性函数逼近问题，实验结果展示了这种方法带来的大量计算节省。

少量梯度评估的政策评估SVRG