The application of supervised learning techniques in combination with model
predictive control (MPC) has recently generated significant interest,
particularly in the area of approximate explicit MPC, where function
approximators like deep neural networks are used to learn the MPC policy via
optimal state-action pairs generated offline. While the aim of approximate
explicit MPC is to closely replicate the MPC policy, substituting online
optimization with a trained neural network, the performance guarantees that
come with solving the online optimization problem are typically lost. This
paper considers an alternative strategy, where supervised learning is used to
learn the optimal value function offline instead of learning the optimal
policy. This can then be used as the cost-to-go function in a myopic MPC with a
very short prediction horizon, such that the online computation burden reduces
significantly without affecting the controller performance. This approach
differs from existing work on value function approximations in the sense that
it learns the cost-to-go function by using offline-collected state-value pairs,
rather than closed-loop performance data. The cost of generating the
state-value pairs used for training is addressed using a sensitivity-based data
augmentation scheme.

使用监督学习技术结合模型预测控制（MPC）在近期引起了显著关注，特别是在近似显式 MPC 领域，其中使用深度神经网络等函数逼近器通过离线生成的最优状态 - 动作对来学习 MPC 策略。本文考虑了一种替代策略，即使用监督学习离线学习最优值函数而不是最优策略。这可以用作具有非常短预测视野的近视型 MPC 中的代价函数，从而大大减少在线计算负担而不影响控制器性能。该方法与现有的值函数逼近研究不同之处在于，它通过使用离线收集的状态 - 值对来学习代价函数，而不是闭环性能数据。通过使用基于敏感度的数据增强方案解决了用于培训的状态 - 值对生成的成本问题。

使用监督学习构建短视多方计算策略

On Building Myopic MPC Policies using Supervised Learning

This paper studies convergence rates for some value function approximations
that arise in a collection of reproducing kernel Hilbert spaces (RKHS)
$H(\Omega)$. By casting an optimal control problem in a specific class of
native spaces, strong rates of convergence are derived for the operator
equation that enables offline approximations that appear in policy iteration.
Explicit upper bounds on error in value function approximations are derived in
terms of power function $\Pwr_{H,N}$ for the space of finite dimensional
approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric
in nature and refine some well-known, now classical results concerning
convergence of approximations of value functions.

该论文研究了出现在再生核希尔伯特空间 (RKHS) H (Ω) 的一组值函数逼近的收敛速度。通过在特定类别的本地空间中建立一个最优控制问题，得出了政策迭代中出现的离线逼近的强收敛速度。利用有限维逼近空间 H_N 的幂函数 Pwr_{H,N}，导出了值函数逼近误差的显式上界，这些上界具有几何性质，对于值函数逼近的收敛性有一定的改进。