In classic reinforcement learning algorithms, agents make decisions at
discrete and fixed time intervals. The physical duration between one decision
and the next becomes a critical hyperparameter. When this duration is too
short, the agent needs to make many decisions to achieve its go