Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
TL;DR本论文介绍了一种基于动态规划的在线算法Real Time Dynamic Programming (RTDP),提出了一种多步贪心RTDP算法$h-RTDP$,比之前的算法在算法复杂度和样本复杂性方面有了明显改进。同时分析了在三种近似设置下的性能,并证明了在近似DP算法中与$h$-RTDP的渐进性能相同
Abstract
real time dynamic programming (rtdp) is a well-known Dynamic Programming (DP) based algorithm that combines planning and learning to find an optimal policy for an MDP. It is a planning algorithm because it uses t