BriefGPT.xyz
Oct, 2023
基于后验采样的时态POMDP学习算法的遗憾分析
Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs
HTML
PDF
Dengwang Tang, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo
TL;DR
本文研究了具有未知转移和观测模型的POMDPs中的情节性学习问题,并证明了其贝叶斯后悔的规模与剧集数的平方根成正比。
Abstract
Compared to Markov Decision Processes (MDPs), learning in
partially observable markov decision processes
(
pomdps
) can be significantly harder due to the difficulty of interpreting observations. In this paper, we
→