BriefGPT.xyz
Sep, 2020
马尔可夫决策过程中最佳策略识别的自适应采样
Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity
HTML
PDF
Aymen Al Marjani, Alexandre Proutiere
TL;DR
本文研究在马尔可夫决策过程中,通过生成模型来识别最优策略,提出了 KLB-TS 算法,并提供了其样本复杂度的渐近保证。
Abstract
We investigate the problem of
best-policy identification
in discounted
markov decision processes
(MDPs) with finite state and action spaces. We assume that the agent has access to a
→