Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face challenges in terms of iteration complexity and robustness. Risk-sensitive RL, which balances expected return and risk, has been explored for its potential to yield probabilistically robust policies, yet its iteration complexity analysis remains underexplored. In this study, we conduct a thorough iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm and employing the exponential utility function. We obtain an iteration complexity of $\mathcal{O}(\epsilon^{-2})$ to reach an $\epsilon$-approximate first-order stationary point (FOSP). We investigate whether risk-sensitive algorithms can achieve better iteration complexity compared to their risk-neutral counterparts. Our theoretical analysis demonstrates that risk-sensitive REINFORCE can have a reduced number of iterations required for convergence. This leads to improved iteration complexity, as employing the exponential utility does not entail additional computation per iteration. We characterize the conditions under which risk-sensitive algorithms can achieve better iteration complexity. Our simulation results also validate that risk-averse cases can converge and stabilize more quickly after approximately half of the episodes compared to their risk-neutral counterparts.

我们对风险敏感策略梯度方法进行了详尽的迭代复杂度分析，得到了达到ε-近似一阶稳定点（FOSP）所需的迭代复杂度为O(ε^{-2})。我们研究了风险敏感算法是否可以达到更好的迭代复杂度；理论分析表明，风险敏感的REINFORCE算法可以减少迭代次数以实现收敛，而使用指数效用函数则不需要额外的每次迭代计算。我们还表征了风险敏感算法能够实现更好迭代复杂度的条件。同时，我们的模拟结果验证了在大约半数的回合后，风险回避情况下的算法与风险中性情况下的算法相比能够更快地收敛和稳定。

面向高效风险敏感策略梯度：迭代复杂性分析