通过交互无悔学习实现强化学习和模仿学习

Jun, 2014

通过交互无悔学习实现强化学习和模仿学习

Reinforcement and Imitation Learning via Interactive No-Regret Learning

Stephane Ross, J. Andrew Bagnell

TL;DR通过交互式学习和无悔在线学习的分析方法，本文扩展了现有结果，发展了利用成本信息的交互式模仿学习方法，并将该技术扩展到应对强化学习，提供了对在线近似策略迭代成功的理论支持，建议了一系列新的算法，并提供了对模仿学习和强化学习现有技术的统一视角。

Abstract

Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed b