政策迭代的复杂性

Jan, 2013

On the Complexity of Policy Iteration

Yishay Mansour, Satinder Singh

TL;DR本文研究关于Markov决策过程的策略迭代算法的收敛性和复杂度，提出了一种复杂度上界的限制方法，不依赖于折扣因子的价值，有效地限制了策略迭代算法收敛至最优策略所需的迭代次数。

Abstract

Decision-making problems in uncertain or stochastic domains are often formulated as markov decision processes (MDPs). policy iteration (PI) is a popular algorithm for searching over policy-space, the size of whic