BriefGPT.xyz
Jun, 2023
TD收敛性:一个优化视角
TD Convergence: An Optimization Perspective
HTML
PDF
Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor
TL;DR
本研究探讨了时间差分(TD)学习算法的收敛行为,通过分析我们的发现,我们将其形式化应用于线性TD设置中的二次损失,以证明TD的收敛取决于两种力量的相互作用,并扩展到比线性逼近和平方损失更广泛的设置中,提供了TD在强化学习中成功应用的理论解释。
Abstract
We study the
convergence behavior
of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative
optimiz
→