BriefGPT.xyz
Feb, 2024
在受限马尔可夫决策过程中的真正无悔学习
Truly No-Regret Learning in Constrained MDPs
HTML
PDF
Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He
TL;DR
本文提出了一种基于正则化原始对偶方案的模型为基础的算法,用于学习未知的多约束CMDP,并证明了该算法在没有误差抵消的情况下能够实现亚线性遗憾。
Abstract
constrained markov decision processes
(CMDPs) are a common way to model
safety constraints
in
reinforcement learning
. State-of-the-art met
→