Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.

本研究针对传统马克ov决策过程在转移概率需要精确已知这一限制假设的缺陷，提出了强鲁棒马克ov决策过程（RMDPs），允许转移概率属于某个不确定性集合。文章提供了RMDPs的基本教学，探讨了其语义和解决方法，以及与其他模型的关系，展示了RMDPs在强化学习和抽象技术等多个领域的应用潜力。

强鲁棒马克ov决策过程：人工智能与形式方法的交汇点