Risk-sensitive reinforcement learning (RL) is crucial for maintaining
reliable performance in many high-stakes applications. While most RL methods
aim to learn a point estimate of the random cumulative cost, distributional RL
(DRL) seeks to estimate the entire distribution of it. The distribution
provides all necessary information about the cost and leads to a unified
framework for handling various risk measures in a risk-sensitive setting.
However, developing policy gradient methods for risk-sensitive DRL is
inherently more complex as it pertains to finding the gradient of a probability
measure. This paper introduces a policy gradient method for risk-sensitive DRL
with general coherent risk measures, where we provide an analytical form of the
probability measure's gradient. We further prove the local convergence of the
proposed algorithm under mild smoothness assumptions. For practical use, we
also design a categorical distributional policy gradient algorithm (CDPG) based
on categorical distributional policy evaluation and trajectory-based gradient
estimation. Through experiments on a stochastic cliff-walking environment, we
illustrate the benefits of considering a risk-sensitive setting in DRL.

该研究论文介绍了一种用于风险敏感分布式强化学习的策略梯度方法，以及一种基于分布式策略评估和轨迹梯度估计的分类分布式策略梯度算法（CDPG）。通过在随机悬崖环境上进行实验，展示了在分布式强化学习中考虑风险敏感性的益处。