BriefGPT.xyz
Feb, 2023
利用额外安全预算在受限策略优化中进行高效探索
Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization
HTML
PDF
Haotian Xu, Shengjie Wang, Zhaolei Wang, Qing Zhuo, Tao Zhang
TL;DR
本文提出了一种ESB-CPO算法,通过在早期阶段增加额外的安全预算来平衡探索和约束,以提高过程的效率,证明其在保证安全性的基础上能够显著提高性能。
Abstract
reinforcement learning
(RL) has achieved promising results on most
robotic control
tasks.
safety
of learning-based controllers is an essen
→