model-based offline reinforcement learning (RL) aims to find highly rewarding
policy, by leveraging a previously collected static dataset and a dynamics
model. While the dynamics model learned through reuse of the static dataset,
its generalization ability hopefully promotes policy lea