Safety in goal directed reinforcement learning (RL) settings has typically
been handled through constraints over trajectories and have demonstrated good
performance in primarily short horizon tasks (goal is not too far away). In
this paper, we are specifically interested in the problem