使用丰富观察的Oracle有效PAC RL

Mar, 2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations

Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford...

TL;DR本文研究了PAC强化学习在富观察力下的计算复杂度，提出了基于确定性隐藏状态动态和随机富观察的可证明的样本有效算法，同时证明了在具有随机隐藏状态动态的情况下，已知样本有效算法OLIVE不能在Oracle模型中实现，通过几个示例表明了在这样一般的设置中可计算PAC强化学习的根本挑战。

Abstract

We study the computational tractability of provably sample-efficient (PAC) reinforcement learning in episodic environments with high-dimensional observations. We present new sample efficient algorithms for environments with deterministic →