In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities. This learning problem is formalized through constrained markov decision processes (CMDPs). In this paper, we investigate the