Selecting exploratory actions that generate a rich stream of experience for
better learning is a fundamental challenge in reinforcement learning (RL). An
approach to tackle this problem consists in selecting actions according to
specific policies for an extended period of time, also kn