High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strategy inspired by the principles of state abstraction and information acquisition under uncertainty. We demonstrate the empirical effectiveness of these techniques, first, as a preliminary check, on two standard tasks (Blackjack and $n$-Chain), and then on two much larger and more realistic tasks with high-dimensional observation spaces. Specifically, we introduce two benchmarks built within the game Minecraft where the observations are pixel arrays of the agent's visual field. A combination of our two algorithmic techniques performs competitively on the standard reinforcement-learning tasks while consistently and substantially outperforming baselines on the two tasks with high-dimensional observation spaces. The new function approximator, exploration strategy, and evaluation benchmarks are each of independent interest in the pursuit of reinforcement-learning methods that scale to real-world domains.

该研究提出一种非参数函数逼近器和基于状态抽象和信息获取的不确定性探索策略来处理高维环境下的强化学习挑战，并在Minecraft游戏中进行了验证，结果表明这两种技术结合起来在标准强化学习任务中表现出色并在高维观察空间的任务中优于基准算法，这为强化学习技术在真实世界中的应用提供了一个有力的方法。

探索性梯度提升用于复杂领域强化学习