BriefGPT.xyz
May, 2012
近似改进策略迭代
Approximate Modified Policy Iteration
HTML
PDF
Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist
TL;DR
本文旨在探讨Modified Policy Iteration(MPI)算法的近似形式,提出了三种扩展的适应于大规模状态和动作空间的DP算法,包括拟合值迭代、拟合Q迭代和基于分类的策略迭代,并提供了统一的误差传播分析方法。同时,对于基于分类的实现,发展了有限样本分析,以显示MPI的主要参数如何控制分类器的估计误差和整体价值函数的近似程度。
Abstract
modified policy iteration
(MPI) is a
dynamic programming
(DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially
→