TD收敛性：一个优化视角

Jun, 2023

TD Convergence: An Optimization Perspective

Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor

TL;DR本研究探讨了时间差分（TD）学习算法的收敛行为，通过分析我们的发现，我们将其形式化应用于线性TD设置中的二次损失，以证明TD的收敛取决于两种力量的相互作用，并扩展到比线性逼近和平方损失更广泛的设置中，提供了TD在强化学习中成功应用的理论解释。

Abstract

We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimiz