The family of temporal difference (TD) methods span a spectrum from
computationally frugal linear methods like TD({\lambda}) to data efficient
least squares methods. Least square methods make the best use of available data
directly computing the TD solution and thus do not require tuning a typically
highly sensitive learning rate parameter, but require quadr
神经时间差异学习是一种用于策略评估的近似时间差异方法,它利用神经网络进行函数逼近。本论文通过对投影到初始点 θ₀周围半径为 ω 的球 B (θ₀, ω) 的神经时间差异学习的收敛性分析,展示了一个近似界限为 O (ε)+~O (1/√m),其中 ε 是 B (θ₀, ω) 中最佳神经网络的逼近质量,而 m 是网络中所有隐藏层的宽度。