As an important framework for safe Reinforcement Learning, the Constrained
Markov Decision Process (CMDP) has been extensively studied in the recent
literature. However, despite the rich results under various on-policy learning
settings, there still lacks some essential understanding of the offline CMDP
problems, in terms of both the algorithm design and the