BriefGPT.xyz
Feb, 2021
Q-Learning算法是否达到Minimax最优性?一种紧凑的样本复杂度分析
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning
HTML
PDF
Gen Li, Changxiao Cai, Yuxin Chen, Yuantao Gu, Yuting Wei...
TL;DR
本文研究Q-learning同步和异步情况下的样本复杂性和子优秀性,并展示在异步情况下的样本复杂性更强,Q-learning算法是严格亚最优的。
Abstract
q-learning
, which seeks to learn the optimal Q-function of a Markov decision process (
mdp
) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that
→