BriefGPT.xyz
Dec, 2023
高效的离线安全强化学习:使用信任区域条件风险
Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk
HTML
PDF
Dohyeong Kim, Songhwai Oh
TL;DR
本论文提出了一种基于风险约束的安全强化学习方法,并通过引入适应性信任区约束以减少分布偏移的影响,解决了在复杂环境中实现优异性能并快速满足安全约束的问题。
Abstract
This paper aims to solve a
safe reinforcement learning
(RL) problem with risk measure-based constraints. As
risk measures
, such as conditional value at risk (CVaR), focus on the tail distribution of cost signals,
→