BriefGPT.xyz
Jul, 2024
简化深度时序差异学习
Simplifying Deep Temporal Difference Learning
HTML
PDF
Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja...
TL;DR
本论文研究了使用Q-learning和TD算法训练深度强化学习模型时的稳定性问题,并提出了一种无需目标网络即可收敛的TD算法PQN,该算法可以在不牺牲样本效率的情况下比传统DQN算法快50倍,使Q-learning再次成为RL算法的可行替代方案。
Abstract
q-learning
played a foundational role in the field reinforcement learning (RL). However,
td algorithms
with off-policy data, such as
q-learning
→