BriefGPT.xyz
Jun, 2023
政策优化中的乐观和适应性
Optimism and Adaptivity in Policy Optimization
HTML
PDF
Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag
TL;DR
本研究通过乐观性和适应性对政策优化进行强化,从而将看似无关的算法重新表述为两个交错步骤的重复应用,并设计了一种通过元梯度学习实现的自适应乐观政策梯度算法。
Abstract
We work towards a unifying paradigm for accelerating
policy optimization
methods in
reinforcement learning
(RL) through \emph{
optimism
} \&
→