BriefGPT.xyz
Oct, 2023
潜在马尔可夫决策过程的前瞻性侧信息
Prospective Side Information for Latent MDPs
HTML
PDF
Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
TL;DR
在许多交互决策的场景中存在着潜在且未被观察到的固定信息。本文研究了拥有潜在上下文信息的潜在马尔可夫决策过程(LMDP)类的问题,证明了任何具有样本高效算法的算法必须至少具有Ω(K^(2/3))的后悔,提出了一个具有匹配上限的算法。
Abstract
In many
interactive decision-making
settings, there is
latent and unobserved information
that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user's p
→