BriefGPT.xyz
Feb, 2019
时序差异学习的源跟踪
Source Traces for Temporal Difference Learning
HTML
PDF
Silviu Pitis
TL;DR
该论文提出了一种基于模型的后继表示法(SR)的源迹(source traces)学习算法,证明了该算法的收敛性,同时开发了一种新算法来学习源图(source map)或SR矩阵,并探索了各种处理源或SR模型的方法,结果表明源迹能有效地与其他基于模型的方法相结合。
Abstract
This paper motivates and develops
source traces
for temporal difference (TD) learning in the tabular setting.
source traces
are like eligibility traces, but model potential histories rather than immediate ones. T
→