一致风险度量的政策梯度

Feb, 2015

Policy Gradient for Coherent Risk Measures

Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

TL;DR该研究拓展了风险敏感的强化学习算法的范围，利用凸优化和演员-评论家（actor-critic）模型处理动态风险测量，提出了一种统一的方法来应对风险敏感的策略梯度方法。

Abstract

We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as the conditional value at risk (CVaR) and the mean-semi-deviation. Our approach is suitable for problems in which the