We consider large-scale Markov decision processes (MDPs) with parameter
uncertainty, under the robust MDP paradigm. Previous studies showed that robust
MDPs, based on a minimax approach to handle uncertainty, can be solved using
dynamic programming for small to medium sized problems. However, due to the
"curse of dimensionality", MDPs that model real-life problems are typically
prohibitively large for such approaches. In this work we employ a reinforcement
learning approach to tackle this planning problem: we develop a robust
approximate dynamic programming method based on a projected fixed point
equation to approximately solve large scale robust MDPs. We show that the
proposed method provably succeeds under certain technical conditions, and
demonstrate its effectiveness through simulation of an option pricing problem.
To the best of our knowledge, this is the first attempt to scale up the robust
MDPs paradigm.

本文研究了面临参数不确定性的大规模马尔可夫决策过程（MDP），并基于鲁棒 MDP 范式，应用增强学习方法解决了规模巨大且无法使用动态规划技术的实际问题解决方法。该方法在特定技术条件下被证明可以成功，通过对期权定价问题的模拟的证明其有效性，是首次尝试扩大鲁棒 MDPs 范式的尝试。