Offline reinforcement learning (RL) offers a promising direction for learning policies from pre-collected datasets without requiring further interactions with the environment. However, existing methods struggle to handle out-of-distribution (OOD) extrapolation errors, especially in sparse reward or scarce data settings. In this paper, we propose a novel training algorithm called Conservative Density Estimation (CDE), which addresses this challenge by explicitly imposing constraints on the state-action occupancy stationary distribution. CDE overcomes the limitations of existing approaches, such as the stationary distribution correction method, by addressing the support mismatch issue in marginal importance sampling. Our method achieves state-of-the-art performance on the D4RL benchmark. Notably, CDE consistently outperforms baselines in challenging tasks with sparse rewards or insufficient data, demonstrating the advantages of our approach in addressing the extrapolation error problem in offline RL.

该论文提出了一种名为保守密度估计（CDE）的新的训练算法，通过对状态-动作占据稳态分布明确定义约束条件，解决了离线强化学习中的样本外推错误问题，该方法在稀疏奖励或数据不足的情况下实现了最先进的性能，对于具有挑战性的任务，CDE方法始终优于基线模型，展示了我们方法在离线强化学习中解决外推错误问题的优势。

通过保守密度估计学习稀疏离线数据集