We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. We adopt an information-theoretical viewpoint and start from the instance-specific lower bound of the number of samples that have to be collected to identify a nearly-optimal policy. Deriving this lower bound along with the optimal exploration strategy entails solving an intricate optimization problem and requires a model of the system. In turn, most existing sample optimal exploration algorithms rely on estimating the model. We derive an approximation of the instance-specific lower bound that only involves quantities that can be inferred using model-free approaches. Leveraging this approximation, we devise an ensemble-based model-free exploration strategy applicable to both tabular and continuous Markov decision processes. Numerical results demonstrate that our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches

采用信息论的观点，我们研究强化学习中的探索问题，并提出了一种新颖的无模型解决方案，通过推导实例特定的下界以及最优的探索策略，我们衍生出一种基于集成模型的无模型探索策略，适用于表格和连续马可夫决策过程， 数值结果表明我们的策略能够比最先进的探索方法更快地找到高效的策略。

强化学习中的无模型主动探索