Offline reinforcement learning (ORL) has gained attention as a means of training reinforcement learning models using pre-collected static data. To address the issue of limited data and improve downstream ORL performance, recent work has attempted to expand the dataset's coverage through data augmentation. However, most of these methods are tied to a specific policy (policy-dependent), where the generated data can only guarantee to support the current downstream ORL policy, limiting its usage scope on other downstream policies. Moreover, the quality of synthetic data is often not well-controlled, which limits the potential for further improving the downstream policy. To tackle these issues, we propose \textbf{HI}gh-quality \textbf{PO}licy-\textbf{DE}coupled~(HIPODE), a novel data augmentation method for ORL. On the one hand, HIPODE generates high-quality synthetic data by selecting states near the dataset distribution with potentially high value among candidate states using the negative sampling technique. On the other hand, HIPODE is policy-decoupled, thus can be used as a common plug-in method for any downstream ORL process. We conduct experiments on the widely studied TD3BC and CQL algorithms, and the results show that HIPODE outperforms the state-of-the-art policy-decoupled data augmentation method and most prevalent model-based ORL methods on D4RL benchmarks.

提出了一种名为HIPODE的数据增强方法，它可以为任何离线强化学习过程提供一种通用的插件方法，使用负采样技术选择可能具有高价值的候选状态附近的状态来生成高质量的合成数据，并在D4RL基准测试中优于最先进的无策略数据增强方法和大多数流行的基于模型的离线强化学习方法。

HIPODE：使用策略解耦方法增强离线强化学习的高质量合成数据