Feb, 2021
Q-Learning 算法是否达到 Minimax 最优性?一种紧凑的样本复杂度分析
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
Gen Li, Changxiao Cai, Yuxin Chen, Yuting Wei, Yuejie Chi
TL;DR本文研究 Q-learning 同步和异步情况下的样本复杂性和子优秀性,并展示在异步情况下的样本复杂性更强,Q-learning 算法是严格亚最优的。