仅受罚Q学习用于离线强化学习

May, 2024

Exclusively Penalized Q-learning for Offline Reinforcement Learning

Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han

TL;DR本文提出了一种约束性的离线强化学习方法EPQ，通过有选择地对易产生估计误差的状态施加惩罚，有效降低估计偏差和提升性能。

Abstract

constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on