The Linear Quadratic Regulator (LQR) framework considers the problem of regulating a linear dynamical system perturbed by environmental noise. We compute the policy regret between three distinct control policies: i) the optimal online policy, whose linear structure is given by the Ricatti equations; ii) the optimal offline linear policy, which is the best linear state feedback policy given the noise sequence; and iii) the optimal offline policy, which selects the globally optimal control actions given the noise sequence. We fully characterize the optimal offline policy and show that it has a recursive form in terms of the optimal online policy and future disturbances. We also show that cost of the optimal offline linear policy converges to the cost of the optimal online policy as the time horizon grows large, and consequently the optimal offline linear policy incurs linear regret relative to the optimal offline policy, even in the optimistic setting where the noise is drawn i.i.d from a known distribution. Although we focus on the setting where the noise is stochastic, our results also imply new lower bounds on the policy regret achievable when the noise is chosen by an adaptive adversary.

本研究在考虑受环境噪音干扰的线性动态系统调节问题中，计算在线和离线控制策略的策略后悔。研究者在离线控制策略的优化上进行了全面的描述，并证明了离线线性策略的代价会随着时间增长而与在线策略的代价收敛，即使在噪声被选择的情况下。

LQR控制中线性控制器的威力