BriefGPT.xyz
Jun, 2020
异步Q学习的样本复杂度:更精确的分析和降低方差
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
HTML
PDF
Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, Yuxin Chen
TL;DR
该研究通过异步Q-learning算法,在马尔科夫决策流程中的样本轨迹中学习最优动作价值函数,给出了基于L∞的样本复杂度分析及等式组,并在此基础上提出一种新的方差缩减技术,进一步提高了算法的效率。
Abstract
asynchronous q-learning
aims to learn the optimal action-value function (or Q-function) of a
markov decision process
(MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing
→