BriefGPT.xyz
May, 2024
学习的非马尔可夫安全性约束下的安全强化学习
Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints
HTML
PDF
Siow Meng Low, Akshat Kumar
TL;DR
在安全强化学习中,我们设计了一个安全模型来评估部分状态-动作轨迹对安全性的贡献,并使用RL-as-inference策略推导出了一种有效的优化安全策略的算法,最后,我们提出了一种动态调整奖励最大化与安全合规性权衡系数的方法,实证结果表明这种方法规模可扩展且能满足复杂的非Markov安全约束。
Abstract
In
safe reinforcement learning
(RL),
safety cost
is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient
→