BriefGPT.xyz
Jun, 2020
L1-鲁棒马尔可夫决策过程的部分策略迭代
Partial Policy Iteration for L1-Robust Markov Decision Processes
HTML
PDF
Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
TL;DR
本文探讨了在考虑转移概率不确定性时,如何高效地解决具有s-和sa-矩形模糊集定义的鲁棒MDP问题,并提出了一种新的策略迭代方案和快速计算鲁棒Bellman算子的方法。实验结果表明,这些方法比使用线性规划求解器结合鲁棒值迭代的现有方法快得多。
Abstract
Robust
markov decision processes
(MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for
→