有限 MDP 中的情节式强化学习：Minimax下界再思考

Oct, 2020

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

TL;DR本文提出了基于问题的独立的新样本复杂度和后悔下限，重点放在了非固定转移核情况下的情况，我们提出了新的样本下限并证明了我们的发现。

Abstract

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic mdps, with a particular foc