BriefGPT.xyz
Jun, 2020
无模型强化学习:从剪切伪懊恼到样本复杂度
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity
HTML
PDF
Zihan Zhang, Yuan Zhou, Xiangyang Ji
TL;DR
本文提出了一种无模型的算法来学习具有折扣因子的马尔可夫决策过程中的政策,该算法的成功概率为(1-p),且具有样本复杂度O(SALn(1/p)/(ε^2(1-γ)^3)),其中S是状态数,A是行动数,γ是折扣因子,ε是一个近似阈值
Abstract
In this paper we consider the problem of learning an $\epsilon$-optimal
policy
for a discounted
markov decision process
(MDP). Given an MDP with $S$ states, $A$ actions, the
→