BriefGPT.xyz
Feb, 2021
潜在MDPs的强化学习: 遗憾保证和下界
RL for Latent MDPs: Regret Guarantees and a Lower Bound
HTML
PDF
Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
TL;DR
在本文中,我们考虑了隐式马尔科夫决策过程中强化学习的遗憾最小化问题,我们提出了一个具有局部保证的有效算法,以解决这个问题。
Abstract
In this work, we consider the
regret minimization
problem for
reinforcement learning
in
latent markov decision processes
(LMDP). In an LMD
→