BriefGPT.xyz
Mar, 2023
零和马尔可夫博弈强化学习的一种新政策迭代算法
A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games
HTML
PDF
Anna Winnicki, R. Srikant
TL;DR
本文提出了一种对于零和马尔可夫游戏的学习策略——lookahead策略,该策略使用简单的naive policy iteration,在计划阶段实现高效的收敛,进一步阐述了在使用我们的算法进行计算规划时的时间复杂度和样本复杂度界限。
Abstract
Many
model-based reinforcement learning
(RL) algorithms can be viewed as having two phases that are iteratively implemented: a learning phase where the model is approximately learned and a
planning phase
where th
→