BriefGPT.xyz
Jun, 2023
抓住意外收获:利用往期成功价值进行非同策略演员-评论家算法
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
HTML
PDF
Tianying Ji, Yu Luo, Fuchun Sun, Xianyuan Zhan, Jianwei Zhang...
TL;DR
提出了混合利用和探索算法(BEE)来解决强化学习后期出现的低估Q值问题,具有较高的样本效率和实用性。
Abstract
Learning high-quality
q-value functions
plays a key role in the success of many modern off-policy
deep reinforcement learning
(RL) algorithms. Previous works focus on addressing the value overestimation issue, an
→