Efficient exploration is one of the key challenges for reinforcement learning (RL) algorithms. Most traditional sample efficiency bounds require strategic exploration. Recently many deep RL algorithm with simple heuristic exploration strategies that have few formal guarantees, achieve surprising success in many domains. These results pose an important question about understanding these exploration strategies such as $e$-greedy, as well as understanding what characterize the difficulty of exploration in MDPs. In this work we propose problem specific sample complexity bounds of $Q$ learning with random walk exploration that rely on several structural properties. We also link our theoretical results to some empirical benchmark domains, to illustrate if our bound gives polynomial sample complexity or not in these domains and how that is related with the empirical performance in these domains.

本研究提出了基于随机游走探索的Q学习的问题特定样本复杂度界限，该界限依赖于多个结构性质，并将理论结果与某些经验基准领域相关联，以说明我们的界限在这些领域中是否具有多项式样本复杂度并与经验绩效相关。

当简单探索具有样本效率：确定随机探索达到PAC RL算法的充分条件