Real-world reinforcement learning (RL) is often severely limited since
typical RL algorithms heavily rely on the reset mechanism to sample proper
initial states. In practice, the reset mechanism is expensive to implement due
to the need for human intervention or heavily engineered envi