BriefGPT.xyz
Jan, 2013
政策迭代的复杂性
On the Complexity of Policy Iteration
HTML
PDF
Yishay Mansour, Satinder Singh
TL;DR
本文研究关于Markov决策过程的策略迭代算法的收敛性和复杂度,提出了一种复杂度上界的限制方法,不依赖于折扣因子的价值,有效地限制了策略迭代算法收敛至最优策略所需的迭代次数。
Abstract
Decision-making problems in uncertain or stochastic domains are often formulated as
markov decision processes
(MDPs).
policy iteration
(PI) is a popular algorithm for searching over policy-space, the size of whic
→