This paper studies systematic exploration for reinforcement learning with
rich observations and function approximation. We introduce a new model called
contextual decision processes, that unifies and generalizes most prior
settings. Our first contribution is a complexity measure, the Bellman rank,
that we show enables tractable learning of near-optimal behavior in these
processes and is naturally small for many well-studied reinforcement learning
settings. Our second contribution is a new reinforcement learning algorithm
that engages in systematic exploration to learn contextual decision processes
with low Bellman rank. Our algorithm provably learns near-optimal behavior with
a number of samples that is polynomial in all relevant parameters but
independent of the number of unique observations. The approach uses Bellman
error minimization with optimistic exploration and provides new insights into
efficient exploration for reinforcement learning with function approximation.

本文探讨了如何使用富观测和函数逼近进行强化学习的系统性探索，并介绍了一种新的模型，即上下文决策过程，它统一和概括了大多数之前的设置，并提出了一种新的强化学习算法，我们的算法用 Bellman rank 衡量复杂度，使用乐观探索最小化 Bellman 误差，保证在所有相关参数的时间多项式的情况下学会近似最优行为，为强化学习提供了新的见解。