Distributionally robust offline reinforcement learning (RL), which seeks
robust policy training against environment perturbation by modeling dynamics
uncertainty, calls for function approximations when facing large state-action
spaces. However, the consideration of dynamics uncertainty introduces essential
nonlinearity and computational burden, posing unique challenges for analyzing
and practically employing function approximation. Focusing on a basic setting
where the nominal model and perturbed models are linearly parameterized, we
propose minimax optimal and computationally efficient algorithms realizing
function approximation and initiate the study on instance-dependent
suboptimality analysis in the context of robust offline RL. Our results uncover
that function approximation in robust offline RL is essentially distinct from
and probably harder than that in standard offline RL. Our algorithms and
theoretical results crucially depend on a variety of new techniques, involving
a novel function approximation mechanism incorporating variance information, a
new procedure of suboptimality and estimation uncertainty decomposition, a
quantification of the robust value function shrinkage, and a meticulously
designed family of hard instances, which might be of independent interest.

分布式鲁棒离线强化学习是针对环境扰动进行鲁棒策略训练的一种方法，当面对大规模状态 - 动作空间时需要进行函数逼近。本研究提出了一种最小极大值最优算法，通过对线性参数化的模型进行实现，探索了实例依赖次优性分析在鲁棒离线强化学习中的应用，并揭示了鲁棒离线强化学习中的函数逼近与标准离线强化学习所面临的困难之间的本质区别。