This paper considers batch Reinforcement Learning (RL) with general value function approximation. Our study investigates the minimal assumptions to reliably estimate/minimize Bellman error, and characterizes the generalization performance by (local) Rademacher complexities of general function classes, which makes initial steps in bridging the gap between statistical learning theory and batch RL. Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class. (2) In the single sampling regime, sample-efficient risk minimization is not possible without further assumptions, regardless of algorithms. However, with completeness assumptions, the excess risk of FQI and a minimax style algorithm can be again bounded by the Rademacher complexity of the corresponding function classes. (3) Fast statistical rates can be achieved by using tools of local Rademacher complexity. Our analysis covers a wide range of function classes, including finite classes, linear spaces, kernel spaces, sparse linear features, etc.

本文探讨了强化学习中的一种 Batch Reinforcement Learning，并对 Bellman误差进行了估计及最小化的元学习方法，通过对一系列函数类（包括有限类、线性空间、核空间、稀疏线性特征等）进行局部 Rademacher 复杂度评估，进而探索算法的泛化性能。研究发现：在双子抽样策略下的经验风险极小化策略的过多风险可以通过函数类的 Rademacher 复杂度进行界定；使用完备性假设可以在算法的 Rademacher 复杂度下再次将 FQI and Minimax 策略优化；通过局部 Rademacher 复杂度可以实现快速统计速率。

批量强化学习中的风险界和 Rademacher 复杂度