Sample efficiency is critical in solving real-world reinforcement learning problems, where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of interactions required to train a policy. Online imitation learning, a specific type of imitation learning that interleaves policy evaluation and policy optimization, is a particularly effective framework for training policies with provable performance guarantees. In this work, we seek to further accelerate the convergence rate of online imitation learning, making it more sample efficient. We propose two model-based algorithms inspired by Follow-the-Leader (FTL) with prediction: MoBIL-VI based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates. When a dynamics model is learned online, these algorithms can provably accelerate the best known convergence rate up to an order. Our algorithms can be viewed as a generalization of stochastic Mirror-Prox by Juditsky et al. (2011), and admit a simple constructive FTL-style analysis of performance. The algorithms are also empirically validated in simulation.

本文介绍两种基于模型的算法，利用 Follow-the-Leader（FTL）规则来提高在线模仿学习系统的收敛速度，其中 MoBIL-VI 算法基于解决变分不等式，而 MoBIL-Prox 算法基于随机一阶更新，这两种方法都利用模型来预测未来的梯度，可以使该学习算法的样本利用率更高。

利用预测模型加速模仿学习