BriefGPT.xyz
Feb, 2023
反步时间差分学习
Backstepping Temporal Difference Learning
HTML
PDF
Han-Dong Lim, Donghwan Lee
TL;DR
本文从纯控制理论的角度提供了对各种纠正离策略误差 TD 学习算法(包括 GTD 和 TDC)的统一视角,并提出了一种基于后掠技术的新的收敛算法,最终在标准 TD-learning 不稳定的环境中实验证实了该算法的收敛性。
Abstract
off-policy learning
ability is an important feature of
reinforcement learning
(RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known t
→