BriefGPT.xyz
Jan, 2014
卡尔曼时差法
Kalman Temporal Differences
HTML
PDF
Matthieu Geist, Olivier Pietquin
TL;DR
介绍了一个新的近似框架,即卡尔曼时间差异(KTD)框架,用于解决强化学习中估值函数的扩展问题,并提供了解决确定性和随机性马尔可夫决策过程的KTD和XKTD算法,证明了其收敛性和比现有算法更好的性能。
Abstract
Because
reinforcement learning
suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the
→