In risk-sensitive learning, one aims to find a hypothesis that minimizes a
risk-averse (or risk-seeking) measure of loss, instead of the standard expected
loss. In this paper, we propose to study the generalization properties of
risk-sensitive learning schemes whose optimand is described via optimized
certainty equivalents (OCE): our general scheme can handle various known risks,
e.g., the entropic risk, mean-variance, and conditional value-at-risk, as
special cases. We provide two learning bounds on the performance of empirical
OCE minimizer. The first result gives an OCE guarantee based on the Rademacher
average of the hypothesis space, which generalizes and improves existing
results on the expected loss and the conditional value-at-risk. The second
result, based on a novel variance-based characterization of OCE, gives an
expected loss guarantee with a suppressed dependence on the smoothness of the
selected OCE. Finally, we demonstrate the practical implications of the
proposed bounds via exploratory experiments on neural networks.

本文提出了对通过优化确定性等价描述的具有不同风险表示的风险敏感学习方案的概括性质进行研究的方法，提供了针对经验 OCE 最小化器的两个学习边界，并通过神经网络实验演示了所提边界的实际意义。

风险敏感学习的学习界限

Learning Bounds for Risk-sensitive Learning

In this paper, we present a new reinforcement learning (RL) algorithm called
Distributional Soft Actor Critic (DSAC), which exploits the distributional
information of accumulated rewards to achieve better performance. Seamlessly
integrating SAC (which uses entropy to encourage exploration) with a principled
distributional view of the underlying objective, DSAC takes into consideration
the randomness in both action and rewards, and beats the state-of-the-art
baselines in several continuous control benchmarks. Moreover, with the
distributional information of rewards, we propose a unified framework for
risk-sensitive learning, one that goes beyond maximizing only expected
accumulated rewards. Under this framework we discuss three specific
risk-related metrics: percentile, mean-variance and distorted expectation. Our
extensive experiments demonstrate that with distribution modeling in RL, the
agent performs better for both risk-averse and risk-seeking control tasks.

DSAC 是一种新的强化学习算法，它通过利用积累奖励的分布信息来获得更好的性能。通过将 SAC 与基本分布式目标观点无缝集成， DSAC 考虑了行动和回报中的随机性，并在几个连续控制基准测试中超越了现有技术基线。除此之外，我们还探讨了三个具体的与风险相关的度量标准：百分位数，均值 - 方差和扭曲期望，通过分布建模实现了 RL 中的风险敏感。