BriefGPT.xyz
Sep, 2021
通过原始对偶方法实现有约束强化学习的零约束违规
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
HTML
PDF
Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal
TL;DR
该研究提出了一种保守随机原始-对偶算法(CSPDA),用于解决基于约束马尔可夫决策过程(CMDP)的强化学习问题,该算法能够在零约束违规的情况下实现ε-最优累积奖励,并提供比现有算法更有效率的复杂度。
Abstract
reinforcement learning
is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some
→