BriefGPT.xyz
Apr, 2023
具有未知时间约束的安全强化学习策略联合学习
Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning
HTML
PDF
Lunet Yifru, Ali Baheri
TL;DR
提出了一种结合逻辑约束强化学习算法和进化算法的框架,用于在不确定或未明确定义安全约束的环境中并发地学习安全约束和最优RL策略,并且该框架以理论保证为支撑,成功地在grid-world环境中识别出可接受的安全约束和RL策略,以及证明了我们的方法的实践效果。
Abstract
In many real-world applications,
safety constraints
for
reinforcement learning
(RL) algorithms are either unknown or not explicitly defined. We propose a framework that concurrently learns
→