Offline reinforcement learning, which aims at optimizing sequential
decision-making strategies with historical data, has been extensively applied
in real-life applications. State-Of-The-Art algorithms usually leverage
powerful function approximators (e.g. neural networks) to alleviate the sample
complexity hurdle for better empirical performances. Despite the successes, a
more systematic understanding of the statistical complexity for function
approximation remains lacking. Towards bridging the gap, we take a step by
considering offline reinforcement learning with differentiable function class
approximation (DFA). This function class naturally incorporates a wide range of
models with nonlinear/nonconvex structures. Most importantly, we show offline
RL with differentiable function approximation is provably efficient by
analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results
provide the theoretical basis for understanding a variety of practical
heuristics that rely on Fitted Q-Iteration style design. In addition, we
further improve our guarantee with a tighter instance-dependent
characterization. We hope our work could draw interest in studying
reinforcement learning with differentiable function approximation beyond the
scope of current research.

使用不同 iable 函数类逼近的离线强化学习方法在实践中得到了广泛应用，它结合了各种具有非线性和非凸结构的模型，能够显著提高算法性能；本文分析了一种最悲观的算法，并证明这种方法的有效性，为探究强化学习与不同 iable 函数逼近方法提供了新的理论基础。