BriefGPT.xyz
Oct, 2021
约束马尔科夫决策过程的更快算法和更精细分析
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process
HTML
PDF
Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang...
TL;DR
本论文提出了一种新的原始对偶方法来解决带限制的马尔可夫决策过程问题,通过熵正规化策略优化器、对偶变量正规化器和Nesterov加速梯度下降对偶优化器等创新方法,全局收敛至凸优化下的凸约束,显示了目前已有的原始对偶算法无法达到的最优复杂度O(1/ε)。
Abstract
The problem of
constrained markov decision process
(CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its utilities/costs. A new
primal-
→