BriefGPT.xyz
Apr, 2017
策略梯度与软 Q 学习的等价性
Equivalence Between Policy Gradients and Soft Q-Learning
HTML
PDF
John Schulman, Pieter Abbeel, Xi Chen
TL;DR
研究表明,$Q$-learning 方法在最初的样本效率和有效性方面能够有效地实现,但其估计的 $Q$-value 非常不准确,本文给出了一个部分解释,即 $Q$-learning 方法在秘密地实现 policy gradient 更新。
Abstract
Two of the leading approaches for
model-free reinforcement learning
are
policy gradient methods
and $Q$-learning methods. $Q$-learning methods can be effective and sample-efficient when they work, however, it is
→