将模仿学习和在线强化学习桥接：一篇乐观的故事

Mar, 2023

将模仿学习和在线强化学习桥接：一篇乐观的故事

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen

TL;DR本研究提出了一种基于离线数据集的RL算法，结合了RL和模仿学习的iRLSVI算法，可显著减少后悔度。

Abstract

In this paper, we address the following problem: Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs. We first pro