Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goal of minimizing expected total discounted costs, in this paper, we analyze the robustness of CVaR-based risk-sensitive RL under RMDP. Firstly, we consider predetermined ambiguity sets. Based on the coherency of CVaR, we establish a connection between robustness and risk sensitivity, thus, techniques in risk-sensitive RL can be adopted to solve the proposed problem. Furthermore, motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets. To solve this, we define a new risk measure named NCVaR and build the equivalence of NCVaR optimization and robust CVaR optimization. We further propose value iteration algorithms and validate our approach in simulation experiments.

使用固定过渡概率的标准马尔科夫决策过程（MDPs）的替代方案，鲁棒马尔科夫决策过程（RMDPs）在不确定性集合中优化最坏情况下的结果。本文研究了在RMDP下基于CVaR的风险敏感强化学习的鲁棒性，分析了预先设定的不确定性集合和状态动作相关的不确定性集合，提出了风险度量NCVaR和相应的优化方法，并通过仿真实验验证了该方法的有效性。

具有条件风险价值的鲁棒风险敏感强化学习