BriefGPT.xyz
Oct, 2023
在线PAC强化学习中追求实例优势
Towards Instance-Optimality in Online PAC Reinforcement Learning
HTML
PDF
Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann
TL;DR
这篇文章提出了第一个关于任何表格化情节型马尔可夫决策过程(MDP)中需要样本复杂性的PAC识别近似最优策略的实例相关下界,并证明了PEDEL算法的样本复杂度接近这个下界。鉴于PEDEL计算的复杂性,我们提出了一个关于能否使用计算高效的算法达到我们的下界的开放性问题。
Abstract
Several recent works have proposed
instance-dependent upper bounds
on the number of episodes needed to identify, with probability $1-\delta$, an $\varepsilon$-optimal policy in
finite-horizon tabular markov decision pro
→