BriefGPT.xyz
Jun, 2014
通过交互无悔学习实现强化学习和模仿学习
Reinforcement and Imitation Learning via Interactive No-Regret Learning
HTML
PDF
Stephane Ross, J. Andrew Bagnell
TL;DR
通过交互式学习和无悔在线学习的分析方法,本文扩展了现有结果,发展了利用成本信息的交互式模仿学习方法,并将该技术扩展到应对强化学习,提供了对在线近似策略迭代成功的理论支持,建议了一系列新的算法,并提供了对模仿学习和强化学习现有技术的统一视角。
Abstract
Recent work has demonstrated that problems-- particularly
imitation learning
and
structured prediction
-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed b
→