BriefGPT.xyz
Feb, 2022
具有短期记忆的可证明强化学习
Provable Reinforcement Learning with a Short-Term Memory
HTML
PDF
Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi
TL;DR
本文研究如何学习部分可观察的马尔科夫决策过程。通过构造一种特殊的子类POMDP,它的隐状态可以通过历史的近期记录来解码。我们使用新颖的瞬时匹配方法,并建立了一组在表格和丰富观察设置下,学习这类问题的近优策略的样本复杂性的上下界,并证明了短期记忆对于这些环境的强化学习已经足够。
Abstract
Real-world
sequential decision making
problems commonly involve
partial observability
, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. C
→