卡尔曼时差法

Jan, 2014

Kalman Temporal Differences

Matthieu Geist, Olivier Pietquin

TL;DR介绍了一个新的近似框架，即卡尔曼时间差异（KTD）框架，用于解决强化学习中估值函数的扩展问题，并提供了解决确定性和随机性马尔可夫决策过程的KTD和XKTD算法，证明了其收敛性和比现有算法更好的性能。

Abstract

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the →