We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.

我们证明了强化学习算法（包括无模型及基于模型的方法）的效率存在一个基本限制，其与环境进行交互寻找最优行为的过程在某类强化学习问题中具有指数级的下界；然而，存在一种对该类问题具有高效解决能力的方法，而该方法并非专门针对该类问题设计；与此相反，我们的限制并不适用于文献中提出的某些方法，例如，以目标为条件的方法或其他构建逆动力学模型的算法。

强化学习方法之间的效率分离：无模型、基于模型和目标条件