BriefGPT.xyz
Jul, 2023
离线增强学习与在线策略Q函数规范化
Offline Reinforcement Learning with On-Policy Q-Function Regularization
HTML
PDF
Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist
TL;DR
提出了两种算法,利用行为策略的Q函数通过正则化来解决离线强化学习中由于数据分布变化而引起的外推误差,该方法在D4RL基准测试中表现出良好的性能。
Abstract
The core challenge of
offline reinforcement learning
(RL) is dealing with the (potentially catastrophic)
extrapolation error
induced by the
distr
→