BriefGPT.xyz
Jan, 2025
基于SALE的离线强化学习与集成Q网络
SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks
HTML
PDF
Zheng Chun
TL;DR
本研究解决了离线强化学习中处理分布外动作的挑战,通过提出一种集成Q网络的无模型演员-评论家算法,增强了训练的稳定性和准确性。研究的关键在于引入梯度多样性惩罚和可调行为克隆项,有效抑制了分布外动作的估计过高现象,并逐步优化演员网络的表现。实验结果表明,该算法在D4RL MuJoCo基准上具有更快的收敛速度和更优的性能。
Abstract
In this work, we build upon the
Offline Reinforcement Learning
algorithm TD7, which incorporates State-Action Learned Embeddings (SALE) and LAP, and propose a model-free
Actor-Critic Algorithm
that integrates
→