Stochastic optimization algorithms, particularly stochastic policy gradient (SPG), report significant success in reinforcement learning (RL). Nevertheless, up to now, that how to speedily acquire an optimal solution for RL is still a challenge. To tackle this issue, this work develops a fast SPG algorithm from the perspective of utilizing a momentum, coined SPG-NM. Specifically, in SPG-NM, a novel type of the negative momentum (NM) technique is applied into the classical SPG algorithm. Different from the existing NM techniques, we have adopted a few hyper-parameters in our SPG-NM algorithm. Moreover, the computational complexity is nearly same as the modern SPG-type algorithms, e.g., accelerated policy gradient (APG), which equips SPG with Nesterov's accelerated gradient (NAG). We evaluate the resulting algorithm on two classical tasks, bandit setting and Markov decision process (MDP). Numerical results in different tasks demonstrate faster convergence rate of the resulting algorithm by comparing state-of-the-art algorithms, which confirm the positive impact of NM in accelerating SPG for RL. Also, numerical experiments under different settings confirm the robustness of our SPG-NM algorithm for some certain crucial hyper-parameters, which ride the user feel free in practice.

从利用动量的角度开发了一种称为SPG-NM的快速SPG算法，将一种新型的负动量技术应用于经典的SPG算法，其计算复杂度与现代SPG类型算法几乎相同，并在两个经典任务中评估了该算法的结果，数值实验在不同设置下对我们的SPG-NM算法的稳健性进行了确认。

快速随机策略梯度：负动量用于强化学习