全随机梯度算法及其在强化学习中的应用

Feb, 2019

全随机梯度算法及其在强化学习中的应用

Total stochastic gradient algorithms and applications in reinforcement learning

Paavo Parmas

TL;DR本文介绍了如何利用总导数规则创建图模型的梯度估算器，并基于密度估计和似然比梯度推导了新的梯度估算器。通过在基于模型的策略梯度算法中测试，本文证明了这些方法的有效性，并揭示了PILCO算法的成功之谜。

Abstract

backpropagation and the chain rule of derivatives have been prominent; however, the total derivative rule has not enjoyed the same amount of attention. In this work we show how the →