BriefGPT.xyz
Apr, 2022
面向有约束MDPs的无痛政策优化
Towards Painless Policy Optimization for Constrained MDPs
HTML
PDF
Arushi Jain, Sharan Vaswani, Reza Babanezhad, Csaba Szepesvari, Doina Precup
TL;DR
研究无限时间、折扣的约束马尔可夫决策过程中的政策优化问题,提出了一种泛化的原始-对偶框架,用于评估算法表现,实例化了此框架来使用硬币投注算法并证明了其结果的目标约束逼近度,以及并非像其他方法一样需要超参数调整,并通过对合成和Cartpole环境的实验证明了其效力和稳健性。
Abstract
We study
policy optimization
in an infinite horizon, $\gamma$-discounted
constrained markov decision process
(CMDP). Our objective is to return a policy that achieves large expected reward with a small constraint
→