BriefGPT.xyz
Jun, 2021
一种可证明高效的无模型算法用于受限制马尔可夫决策过程
A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes
HTML
PDF
Honghao Wei, Xin Liu, Lei Ying
TL;DR
本文提出了一种适用于约束马尔可夫决策过程(CMDP)的无模拟无模型强化学习算法Triple-Q,并且该算法具有亚线性后悔和零约束违规。
Abstract
This paper presents the first {\em model-free}, {\em simulator-free}
reinforcement learning
algorithm for
constrained markov decision processes
(CMDPs) with sublinear
→