In reinforcement learning, it is common to let an agent interact for a fixed
amount of time with its environment before resetting it and repeating the
process in a series of episodes. The task that the agent has to learn can
either be to maximize its performance over (i) that fixed per