BriefGPT.xyz
Jan, 2023
基于策略迭代和蒙特卡罗策略评估的强化学习收敛性
On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation
HTML
PDF
Anna Winnicki, R. Srikant
TL;DR
本文提出解决一种强化学习中的长期悬而未决的问题,通过使用前瞻而非简单的贪心策略迭代来提高策略,同时在表格和函数逼近设置中都提供了结果。我们证明了这种策略迭代方案收敛于最优策略。
Abstract
A common technique in
reinforcement learning
is to evaluate the value function from
monte carlo simulations
of a given policy, and use the estimated value function to obtain a new policy which is greedy with resp
→