Kaiqing Zhang, Sham M. Kakade, Tamer Başar, Lin F. Yang
TL;DR本文探究了基于模型的强化学习算法在多智能体环境中的样本复杂度,通过两人零和马尔科夫博弈问题的研究发现,此种算法的样本复杂度为大 O (SA(1-γ)-3ε-2),优于其他方法,但其依赖于动作空间大小,存在一定局限性。
Abstract
model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent rl (MARL), as