offline reinforcement learning (RL) aims at learning an optimal policy from a batch of collected data, without extra interactions with the environment during training. Offline RL attempts to alleviate the hazardous executions in environments, thus it will greatly broaden the scope of R