BriefGPT.xyz
May, 2021
稳健性马尔可夫决策过程理论研究:样本复杂度与渐近性
Non-asymptotic Performances of Robust Markov Decision Processes
HTML
PDF
Wenhao Yang, Zhihua Zhang
TL;DR
本文研究了鲁棒马尔可夫决策过程的最优鲁棒策略和价值函数的非渐近和渐近性能,并考虑了不同的不确定性集。实验验证了最优鲁棒价值函数在理论和实际应用中均呈现出典型的 √n 比例的渐近正态性。
Abstract
In this paper, we study the non-
asymptotic performance
of optimal policy on robust
value function
with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset with
→