Model-free deep reinforcement learning (RL) has demonstrated its superiority
on many complex sequential decision-making problems. However, heavy dependence
on dense rewards and high sample-complexity impedes the wide adoption of these
methods in real-world scenarios. On the other hand,