BriefGPT.xyz
Jan, 2022
通过贝叶斯世界模型实现受限策略优化
Constrained Policy Optimization via Bayesian World Models
HTML
PDF
Yarden As, Ilnura Usmanova, Sebastian Curi, Andreas Krause
TL;DR
LAMBDA 是一种基于模型的新型策略优化方法,利用贝叶斯世界模型提高强化学习的样本效率和安全性,在 Safety-Gym 基准测试中表现优异。
Abstract
Improving sample-efficiency and safety are crucial challenges when deploying
reinforcement learning
in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for
policy optimization
→