We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp *statistical threshold* for the horizon length necessary for efficient learning. On the computational side, we show that under a weaker assumption of separability under the optimal policy, there is a quasi-polynomial algorithm with time complexity scaling in terms of the statistical threshold. We further show a near-matching time complexity lower bound under the exponential time hypothesis.

我们研究了学习潜在马尔可夫决策过程（LMDPs）的计算和统计学方面。本文的主要目标是建立一个几乎精确的统计阈值，以实现有效学习所需的时间长度。在计算方面，我们证明，在最优策略下具有较弱的分离性假设时，存在一个几乎多项式的算法，时间复杂度与统计阈值成比例。我们还基于指数时间假设，展示了一个近似的时间复杂度下界。

分离潜在马尔可夫决策过程中的近优学习与规划