BriefGPT.xyz
Jul, 2020
PID Langrangian方法实现的强化学习响应性安全性
Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
HTML
PDF
Adam Stooke, Joshua Achiam, Pieter Abbeel
TL;DR
本研究解决拉格朗日算法在安全强化学习中产生超调和振荡的问题,提出了一种新的拉格朗日乘数更新方法,并将其应用于深度强化学习,成功在Safety Gym等安全基准中创造了新的最佳表现。
Abstract
lagrangian methods
are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to
safe reinforcement learning
, leads to con
→