BriefGPT.xyz
Jun, 2023
基于平均奖励的马尔可夫决策过程更为精确的无模型强化学习
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
HTML
PDF
Zihan Zhang, Qiaomin Xie
TL;DR
我们提出了多种经过证明有效的无模型强化学习算法,包括基于参考优势分解的在线无模型强化学习算法以及适用于模拟器环境的无模型强化学习算法,在平均报酬马尔科夫决策过程中实现更好的折扣估计和置信区间的高效构建。
Abstract
We develop several provably efficient
model-free reinforcement learning
(RL) algorithms for infinite-horizon average-reward
markov decision processes
(MDPs). We consider both online setting and the setting with a
→