离线增强学习与在线策略Q函数规范化

Jul, 2023

Offline Reinforcement Learning with On-Policy Q-Function Regularization

Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist

TL;DR提出了两种算法，利用行为策略的Q函数通过正则化来解决离线强化学习中由于数据分布变化而引起的外推误差，该方法在D4RL基准测试中表现出良好的性能。

Abstract

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distr