BriefGPT.xyz
May, 2023
乐观自然策略梯度:一种简单高效的在线强化学习策略优化框架
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
HTML
PDF
Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári
TL;DR
本文提出了一种称为Optimistic NPG的简单高效策略优化框架,该框架的样本复杂度具有最优的维度依赖性,可以高效地学习线性MDP和函数逼近下的最优策略。
Abstract
While
policy optimization
algorithms have played an important role in recent empirical success of
reinforcement learning
(RL), the existing theoretical understanding of
→