BriefGPT.xyz
Oct, 2023
平均回报马尔可夫决策过程的最佳样本复杂度
Optimal Sample Complexity for Average Reward Markov Decision Processes
HTML
PDF
Shengbo Wang, Jose Blanchet, Peter Glynn
TL;DR
我们在具有均匀遍历的马尔可夫决策过程(MDP)中,通过建立一个估计器来实现平均奖励MDP的最优策略,其样本复杂度达到文献中的下界,并借鉴了Jin和Sidford(2021)以及Li等人(2020)的算法思想。
Abstract
We settle the
sample complexity
of
policy learning
for the maximization of the long run average reward associated with a uniformly ergodic
markov
→