BriefGPT.xyz
Mar, 2022
基于价值和密度比实现的离线强化学习:间隙的威力
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps
HTML
PDF
Jinglin Chen, Nan Jiang
TL;DR
本研究针对离线强化学习中的样本利用效率问题,提出了基于地位结构的重要性采样(MIS)的悲观算法,并利用较弱的函数逼近前提给出保证。
Abstract
We consider a challenging theoretical problem in
offline reinforcement learning
(RL): obtaining
sample-efficiency guarantees
with a dataset lacking sufficient coverage, under only
→