政策梯度算法是否真的是梯度算法？

Jun, 2019

政策梯度算法是否真的是梯度算法？

Is the Policy Gradient a Gradient?

Chris Nota, Philip S. Thomas

TL;DR全球顶级会议发表的论文中存在误导性，关于drop state distribution中的折扣因素对于算法的影响，一些方法没有优化折扣奖励函数，因为它们优化的是逼近Most method更新方向的不可微、不存在导函数的函数，因此这些算法不保证会收敛到一个合理的最优解。

Abstract

The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters. However, most policy gradient methods do not use the discount factor in the manner originally prescribed, and therefore do not optimize the →