In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. simulation-based training can alleviate this issue, but may suffer from its inherent mismatches from the simulator and real environmen