BriefGPT.xyz
Jul, 2020
具有正则化修正的梯度时序差分学习
Gradient Temporal-Difference Learning with Regularized Corrections
HTML
PDF
Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White...
TL;DR
介绍了一种新的 TD 方法——TDRC,它在易用性、正确性和性能之间平衡,在 TD 表现良好时,表现与 TD 相当,并且在 TD 发散时保持正确性。
Abstract
It is still common to use
q-learning
and
temporal difference
(TD) learning-even though they have divergence issues and sound
gradient td
a
→