BriefGPT.xyz
Nov, 2010
将模仿学习和结构化预测转化为无遗憾在线学习
No-Regret Reductions for Imitation Learning and Structured Prediction
HTML
PDF
Stephane Ross, Geoffrey J. Gordon, James A. Bagnell
TL;DR
本文提出了一种新的迭代算法,该算法在在线学习环境中训练一个稳定的确定性策略,结合特定的降维假设,找到了具有良好性能的策略,克服了之前方法的一些不足,实验表明该方法在两个挑战性的仿真学习问题和基准序列标记问题上表现优异。
Abstract
sequential prediction
problems such as
imitation learning
, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poo
→