We investigate the problem of learning Linear Quadratic Regulators (LQR) in a
multi-task, heterogeneous, and model-free setting. We characterize the
stability and personalization guarantees of a Policy Gradient-based (PG)
Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017) approach for the LQR
problem under different task-heterogeneity settings. We show that the MAML-LQR
approach produces a stabilizing controller close to each task-specific optimal
controller up to a task-heterogeneity bias for both model-based and model-free
settings. Moreover, in the model-based setting, we show that this controller is
achieved with a linear convergence rate, which improves upon sub-linear rates
presented in existing MAML-LQR work. In contrast to existing MAML-LQR results,
our theoretical guarantees demonstrate that the learned controller can
efficiently adapt to unseen LQR tasks.

在多任务、异构和无模型的情况下，我们研究了学习线性二次调节器（LQR）的问题。我们表征了基于策略梯度的无模型元学习方法（MAML）（Finn et al.，2017）在不同任务异质性设置下的稳定性和个性化保证。我们展示了 MAML-LQR 方法在模型为基础和无模型设置下产生了一个接近每个任务特定最优控制器的稳定控制器，直到任务异质性偏差为止。此外，在模型为基础的设置中，我们展示了这个控制器以线性收敛速度实现，这在现有的 MAML-LQR 工作中改进了次线性速度。与现有的 MAML-LQR 结果相比，我们的理论保证证明了学到的控制器可以高效地适应未见的 LQR 任务。