BriefGPT.xyz
Jun, 2022
通过保守的自然策略梯度原始-对偶算法实现约束强化学习的零约束违反
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
HTML
PDF
Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal
TL;DR
提出一种新颖的C-NPG-PD算法以达到全局最优并减少训练样例复杂度,解决了连续状态-动作空间下的限制马尔可夫决策过程问题。
Abstract
We consider the problem of
constrained markov decision process
(CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel
conse
→