BriefGPT.xyz
Mar, 2018
使用丰富观察的Oracle有效PAC RL
On Polynomial Time PAC Reinforcement Learning with Rich Observations
HTML
PDF
Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford...
TL;DR
本文研究了PAC强化学习在富观察力下的计算复杂度,提出了基于确定性隐藏状态动态和随机富观察的可证明的样本有效算法,同时证明了在具有随机隐藏状态动态的情况下,已知样本有效算法OLIVE不能在Oracle模型中实现,通过几个示例表明了在这样一般的设置中可计算PAC强化学习的根本挑战。
Abstract
We study the
computational tractability
of provably sample-efficient (PAC) reinforcement learning in episodic environments with high-dimensional observations. We present new sample efficient algorithms for environments with deterministic
→