offline reinforcement learning (RL) defines a sample-efficient learning
paradigm, where a policy is learned from static and previously collected
datasets without additional interaction with the environment. The major
obstacle to offline RL is the estimation error arising from evaluatin