BriefGPT.xyz
Sep, 2022
安全策略优化的约束更新投影方法
Constrained Update Projection Approach to Safe Policy Optimization
HTML
PDF
Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou...
TL;DR
提出了一种基于限制更新投影框架的新型策略优化方法CUP,其安全性得到了保证,并通过对代理人探索危险区域的限制来进一步确保安全;实验结果表明CUP具有较强的实际表现和安全性能。
Abstract
safe reinforcement learning
(RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel
policy optimization
metho
→