The recent offline reinforcement learning (RL) studies have achieved much
progress to make RL usable in real-world systems by learning policies from
pre-collected datasets without environment interaction. Unfortunately, existing
offline RL methods still face many practical challenges i