model-based offline reinforcement learning (RL), which builds a supervised
transition model with logging dataset to avoid costly interactions with the
online environment, has been a promising approach for offline policy
optimization. As the discrepancy between the logging data and onli