BriefGPT.xyz
Jul, 2023
LLQL: 强化学习的逻辑似然Q-Learning
LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
HTML
PDF
Outongyi Lv, Bingxin Zhou, Yu Guang Wang
TL;DR
该研究分析了在线环境和脱机环境中Bellman逼近误差的分布特性,并提出了一种新的损失函数LLoss,其具有更小的方差,并且实验证实了在离线数据集中奖励应该遵循特定分布,这为进一步深入研究提供了有价值的见解。
Abstract
Currently, research on
reinforcement learning
(RL) can be broadly classified into two categories: online RL and offline RL. Both in online and offline RL, the primary focus of research on the
bellman error
lies i
→