Offline reinforcement learning (RL) provides a promising direction to exploit the massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative for value estimation and action selection. However, such conservatism impairs the robustness of learned policies, leading to a significant change even for a small perturbation on observations. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset and additional conservative value estimation on these OOD states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve the state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbation.

本文介绍了一种名为 Robust Offline Reinforcement Learning(RORL) 的保守平滑技术，用于解决当前离线 RL 算法在真实环境中遇到观测扰动时的鲁棒性问题，同时还能在性能和鲁棒性上实现权衡，并取得了非常好的表现。

RORL: 基于保守平滑的强化学习离线稳健性算法