BriefGPT.xyz
Jun, 2022
可观测POMDP中的学习, 无需计算难以处理的预言机
Learning in Observable POMDPs, without Computationally Intractable Oracles
HTML
PDF
Noah Golowich, Ankur Moitra, Dhruv Rohatgi
TL;DR
该论文介绍了一种基于近似多项式时间算法的部分可观测马可夫决策过程无预言学习算法,该算法不是基于传统的探索-利用原则,而是采用几何拓扑中的重心跨度技术构建策略套接,并且通过对状态分布和观测分布的假设来保证合理性。
Abstract
Much of
reinforcement learning
theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in
partially observable markov decision processes
(POMD
→