BriefGPT.xyz
Feb, 2022
悲观引导的不确定性驱动离线强化学习
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning
HTML
PDF
Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg...
TL;DR
本文提出了一种基于纯不确定性驱动的离线策略学习算法 - 悲观引导离线学习 (PBRL),它通过引入一种Q函数的不确定度来量化不确定性,并以此进行悲观更新,以解决离线学习中由行为分布外数据所产生的外推误差问题。实验证明,相比现有算法,PBRL具有更好的性能表现。
Abstract
offline reinforcement learning
(RL) aims to learn policies from previously collected datasets without exploring the environment. Directly applying
off-policy algorithms
to offline RL usually fails due to the
→