BriefGPT.xyz
Mar, 2023
将模仿学习和在线强化学习桥接:一篇乐观的故事
Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale
HTML
PDF
Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen
TL;DR
本研究提出了一种基于离线数据集的RL算法,结合了RL和模仿学习的iRLSVI算法,可显著减少后悔度。
Abstract
In this paper, we address the following problem: Given an offline demonstration dataset from an
imperfect expert
, what is the best way to leverage it to bootstrap
online learning
performance in MDPs. We first pro
→