In the realm of reinforcement learning (RL), accounting for risk is crucial
for making decisions under uncertainty, particularly in applications where
safety and reliability are paramount. In this paper, we introduce a general
framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL),
with static Lipschitz Risk Measures (LRM) and general function approximation.
Our framework covers a broad class of risk-sensitive RL, and facilitates
analysis of the impact of estimation functions on the effectiveness of RSRL
strategies and evaluation of their sample complexity. We design two innovative
meta-algorithms: \texttt{RS-DisRL-M}, a model-based strategy for model-based
function approximation, and \texttt{RS-DisRL-V}, a model-free approach for
general value function approximation. With our novel estimation techniques via
Least Squares Regression (LSR) and Maximum Likelihood Estimation (MLE) in
distributional RL with augmented Markov Decision Process (MDP), we derive the
first $\widetilde{\mathcal{O}}(\sqrt{K})$ dependency of the regret upper bound
for RSRL with static LRM, marking a pioneering contribution towards
statistically efficient algorithms in this domain.

该研究介绍了一种风险敏感的分布式强化学习 (RS-DisRL) 框架，包括静态 Lipschitz 风险度量、泛函逼近等，用于分析评估 RSRL 策略的估计函数对其有效性和样本复杂度的影响，并设计了两种创新的元算法：面向基于模型的函数逼近的 RS-DisRL-M 和面向通用价值函数逼近的 RS-DisRL-V。通过利用最小二乘回归 (LSR) 和最大似然估计 (MLE) 的新颖估计技术，结合增强马尔可夫决策过程 (MDP) 中的分布式 RL，推导出了具有静态 Lipschitz 风险度量的 RSRL 的遗憾上界的首个 O (√K) 依赖关系，对这个领域中的统计有效算法做出了创新性贡献。