BriefGPT.xyz
Nov, 2019
通过原始-对偶方法实现强化学习的安全策略
Safe Policies for Reinforcement Learning via Primal-Dual Methods
HTML
PDF
Santiago Paternain, Miguel Calvo-Fullana, Luiz F. O. Chamon, Alejandro Ribeiro
TL;DR
研究控制一个在运作时间内有高概率保持期望安全集合的Markov决策过程的学习问题,使用一种约束的Markov决策过程来处理,通过提出一种问题的差分松弛方法,使得有最优安全保障的策略能够被发现。
Abstract
In this paper, we study the learning of
safe policies
in the setting of
reinforcement learning
problems. This is, we aim to control a
markov deci
→