We study the following question in the context of imitation learning for
continuous control: how are the underlying stability properties of an expert
policy reflected in the sample-complexity of an imitation learning task? We
provide the first results showing that a surprisingly granular connection can
be made between the underlying expert system's incremental gain stability, a
novel measure of robust convergence between pairs of system trajectories, and
the dependency on the task horizon $T$ of the resulting generalization bounds.
In particular, we propose and analyze incremental gain stability constrained
versions of behavior cloning and a DAgger-like algorithm, and show that the
resulting sample-complexity bounds naturally reflect the underlying stability
properties of the expert system. As a special case, we delineate a class of
systems for which the number of trajectories needed to achieve
$\varepsilon$-suboptimality is sublinear in the task horizon $T$, and do so
without requiring (strong) convexity of the loss function in the policy
parameters. Finally, we conduct numerical experiments demonstrating the
validity of our insights on both a simple nonlinear system for which the
underlying stability properties can be easily tuned, and on a high-dimensional
quadrupedal robotic simulation.

在模仿学习中，专家策略的稳定性对模仿学习任务的样本复杂度有明显的影响，本文提出了增量收益稳定性约束版本的行为克隆和 DAgger 算法，通过实验验证了依赖任务地平线的泛化界限与系统的稳定性之间的关系。