BriefGPT.xyz
Dec, 2023
具有全局收敛保证的内点约束强化学习
Interior Point Constrained Reinforcement Learning with Global Convergence Guarantees
HTML
PDF
Tingting Ni, Maryam Kamgarpour
TL;DR
在无限时间、约束的马尔科夫决策过程中,通过零阶内点方法实现约束满足,以最大化预期累积奖励,确保策略在学习过程中的可行性,并具有样本复杂度O(ε^(-6))
Abstract
We consider
discounted infinite horizon
constrained markov decision processes
(CMDPs) where the goal is to find an
optimal policy
that max
→