Researchers have demonstrated state-of-the-art performance in sequential
decision making problems (e.g., robotics control, sequential prediction) with
deep neural network models. One often has access to near-optimal oracles that
achieve good performance on the task during training. We demonstrate that
AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL)
approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve
faster and better solutions with less training data than a less-informed
Reinforcement Learning (RL) technique. Using both feedforward and recurrent
neural network predictors, we present stochastic gradient procedures on a
sequential prediction task, dependency-parsing from raw image data, as well as
on various high dimensional robotics control problems. We also provide a
comprehensive theoretical study of IL that demonstrates we can expect up to
exponentially lower sample complexity for learning with AggreVaTeD than with RL
algorithms, which backs our empirical findings. Our results and theory indicate
that the proposed approach can achieve superior performance with respect to the
oracle when the demonstrator is sub-optimal.

使用 Imitation Learning 的 Policy Gradient Extension 能够充分利用优秀的预测模型，在深度神经网络处理的机器人控制及序列预测任务上比弱化的 Reinforcement Learning 更高效、损失较小，其 IL 的理论研究展现 AggreVaTeD 比其他 RL 算法更少的样本能达到更优质的性能