Partially observable Markov decision processes (POMDPs) have been widely
applied to capture many real-world applications. However, existing theoretical
results have shown that learning in general POMDPs could be intractable, where
the main challenge lies in the lack of latent state information. A key
fundamental question here is how much hindsight state information (HSI) is
sufficient to achieve tractability. In this paper, we establish a lower bound
that reveals a surprising hardness result: unless we have full HSI, we need an
exponentially scaling sample complexity to obtain an $\epsilon$-optimal policy
solution for POMDPs. Nonetheless, from the key insights in our lower-bound
construction, we find that there exist important tractable classes of POMDPs
even with partial HSI. In particular, for two novel classes of POMDPs with
partial HSI, we provide new algorithms that are shown to be near-optimal by
establishing new upper and lower bounds.

本文研究部分可观察马尔科夫决策过程（POMDP），发现除非我们拥有完整的后见状态信息，否则需要指数级的样本复杂度才能实现对 POMDP 的一个 ε- 最优策略解，但有部分 POMDP 分类情况下，其状态信息是足够的，本文提出了新的算法并证实这些算法是近似最优解。