As a framework for sequential decision-making, reinforcement learning (RL)
has been regarded as an essential component leading to Artificial General
Intelligence (AGI). However, RL is often criticized for having the same
training environment as the test one, which also hinders its appl