BriefGPT.xyz
Oct, 2022
受限马尔科夫决策过程中的安全策略改进
Safe Policy Improvement in Constrained Markov Decision Processes
HTML
PDF
Luigi Berducci, Radu Grosu
TL;DR
该研究提出了一种解决增强学习自动合成策略的算法,该算法通过解决奖励形状设计和安全策略更新等挑战来实现,同时使用基于模型的RL算法来有效地利用我们收集的数据,并在标准控制基准中展示了其有效性和鲁棒性。
Abstract
The automatic synthesis of a policy through
reinforcement learning
(RL) from a given set of
formal requirements
depends on the construction of a reward signal and consists of the iterative application of many pol
→