Most reinforcement learning algorithms take advantage of an experience replay
buffer to repeatedly train on samples the agent has observed in the past. Not
all samples carry the same amount of significance and simply assigning equal
importance to each of the samples is a naïve strategy