In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning. We focus our attention on the sub-classes of \textit{$\gamma$-observable} and \textit{decodable POMDPs}, for which it has been shown that statistically tractable learning is possible, but there has not been any computationally efficient algorithm. We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) to perform representation learning and achieve efficient sample complexity, while only calling supervised learning computational oracles. We then show how to adapt this algorithm to also work in the broader class of $\gamma$-observable POMDPs.

本文研究部分可观测马尔可夫决策过程的表示学习，其中智能体学习将高维原始观察映射到紧凑表示并用于更高效的探索和规划，并提出一种基于最大似然估计和不确定性乐观算法的表示学习算法，从而在计算复杂度上获得高效的采样复杂度。

低秩POMDP中可证明高效且具可行性的表示学习