BriefGPT.xyz
Sep, 2023
有保证的受限强化学习中高效的探索:后验抽样即可
Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need
HTML
PDF
Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
TL;DR
基于后验抽样的算法在约束马尔可夫决策过程(CMDP)的无限时间不折扣设置中提供了近最优的遗憾界限,同时在实证上比现有算法更具优势。
Abstract
We present a new
algorithm
based on
posterior sampling
for learning in
constrained markov decision processes
(CMDP) in the infinite-horizo
→