In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in essence, allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees, with respect to uncertainties in the system's states transition probabilities. The need for RCMPDs is important for real-life applications of RL. For instance, such formulation can play an important role for policy transfer from simulation to real world (Sim2Real) in safety critical applications, which would benefit from performance and safety guarantees which are robust w.r.t model uncertainty. We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm. We finally validate this concept on the inventory management problem.

本文介绍了一个基于Constrained Markov Decision Process（CMDP）和Robust Markov Decision Process（RMDP）的框架，即Robust Constrained-MDPs（RCMDP），用于设计强大而稳健的强化学习算法，并提供相应的约束满足保证。同时，还将这个框架用于从模拟到真实世界的政策转移中，以实现对模型不确定性的强鲁棒性和安全保障。最后，我们在库存管理问题上验证了这个框架的有效性。

鲁棒受限制马尔科夫决策过程: 在模型不确定性下进行软受限制鲁棒策略优化