BriefGPT.xyz
Jun, 2021
离线强化学习的Bellman一致悲观算法
Bellman-consistent Pessimism for Offline Reinforcement Learning
HTML
PDF
Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
TL;DR
本文介绍了一种Bellman-consistent的悲观算法,在深度学习的数据集中使用较为普遍,通过对探索性场景的标准Bellman闭合性理论保证了算法的鲁棒性,并且在样本复杂度上比其他算法有显著提高。
Abstract
The use of
pessimism
, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in
offline reinforcement learning
. Despite the robustness it adds to the algorithm, overly pessimi
→