BriefGPT.xyz
May, 2024
仅受罚Q学习用于离线强化学习
Exclusively Penalized Q-learning for Offline Reinforcement Learning
HTML
PDF
Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han
TL;DR
本文提出了一种约束性的离线强化学习方法EPQ,通过有选择地对易产生估计误差的状态施加惩罚,有效降低估计偏差和提升性能。
Abstract
constraint-based offline reinforcement learning
(RL) involves policy constraints or imposing penalties on the value function to mitigate
overestimation errors
caused by distributional shift. This paper focuses on
→