As an important framework for safe Reinforcement Learning, the constrained markov decision process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding o