TL;DR本文探讨了一种新的多算法策略,即将多种不同的 RL 和 IL 算法统一到一个 mirror descent 框架下,并提出了名为 LOKI 的基于策略学习的策略,通过 IL 和 RL 的结合可以优于次优专家。
Abstract
imitation learning (IL) consists of a set of tools that leverage expert
demonstrations to quickly learn policies. However, if the expert is suboptimal,
IL can yield policies with inferior performance compared to reinforcement
learning (RL). In this paper, we aim to provide an algorithm