offline reinforcement learning (RL) aims to learn policies from static
datasets of previously collected trajectories. Existing methods for offline RL
either constrain the learned policy to the support of offline data or utilize
model-based virtual environments to generate simulated rol