BriefGPT.xyz
May, 2023
短预热期折扣MDP的遗憾最优免模型强化学习
Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time
HTML
PDF
Xiang Ji, Gen Li
TL;DR
本文提出了一个模型自由的算法,通过方差降低和新颖的执行策略,解决了强化学习马尔可夫决策过程中无法实现遗憾最优和存在长时间燃烧期的问题,实现了短燃烧期下的最优采样效率。
Abstract
A crucial problem in
reinforcement learning
is learning the optimal policy. We study this in tabular infinite-horizon discounted
markov decision processes
under the online setting. The existing algorithms either
→