BriefGPT.xyz
Jun, 2023
在线敏感采样下的低转换策略梯度与探索
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
HTML
PDF
Yunfan Li, Yiran Wang, Yu Cheng, Lin Yang
TL;DR
本文提出了一种 LPO 算法来解决强化学习中的政策优化问题,其中包括限制 eluder 维度和在线灵敏度采样等最近进展的应用,可以实现一定程度的非线性函数逼近,通过使用深度神经网络验证了理论方法的成果。
Abstract
policy optimization
methods are powerful algorithms in
reinforcement learning
(RL) for their flexibility to deal with policy parameterization and ability to handle model misspecification. However, these methods u
→