With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL. More visualization results are available at https://jzndd.github.io/CP3ER-Page/.

本研究针对视觉强化学习中的低样本效率和训练稳定性问题，提出了一种样本基础的熵正则化方法，旨在稳定策略训练。通过优先近端经验正则化（CP3ER），该方法在DeepMind控制套件和Meta-world的21个任务中实现了新的最先进（SOTA）性能，首次将一致性模型应用于视觉强化学习，展示了其潜力。

将一致性策略推广到带有优先近端经验正则化的视觉强化学习