BriefGPT.xyz
Jun, 2021
基于优势干预的安全强化学习
Safe Reinforcement Learning Using Advantage-Based Intervention
HTML
PDF
Nolan Wagener, Byron Boots, Ching-An Cheng
TL;DR
提出了一个新的算法SAILR,该算法使用基于优势函数的干预机制在训练期间保持代理的安全,并使用为无约束MDP设计的现成强化学习算法来优化代理的策略。在使用实验证明了该算法在训练和部署期间都具有较强的安全性和良好的策略表现。
Abstract
Many sequential decision problems involve finding a policy that maximizes total reward while obeying
safety constraints
. Although much recent research has focused on the development of
safe reinforcement learning
→