BriefGPT.xyz
Sep, 2018
马尔可夫环境下有限样本分析GTD策略评估算法
Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting
HTML
PDF
Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu
TL;DR
本文首次针对Markov过程下GTD算法进行了有限样本边界分析,证明了变体步长的GTD算法会收敛且收敛速度与步长和混合时间有关,说明经验回放技巧通过改善Markov过程的混合性能有利于算法收敛。
Abstract
In
reinforcement learning
(RL) , one of the key components is
policy evaluation
, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good
→