This paper studies an infinite horizon optimal control problem for
discrete-time linear systems and quadratic criteria, both with random
parameters which are independent and identically distributed with respect to
time. A classical approach is to solve an algebraic Riccati equation that
involves mathematical expectations and requires certain statistical information
of the parameters. In this paper, we propose an online iterative algorithm in
the spirit of Q-learning for the situation where only one random sample of
parameters emerges at each time step. The first theorem proves the equivalence
of three properties: the convergence of the learning sequence, the
well-posedness of the control problem, and the solvability of the algebraic
Riccati equation. The second theorem shows that the adaptive feedback control
in terms of the learning sequence stabilizes the system as long as the control
problem is well-posed. Numerical examples are presented to illustrate our
results.

本文针对离散时间线性系统和二次标准的随机参数情况，提出一种基于 Q-learning 精神的在线迭代算法来求解这个无限时间视角下的最优控制问题。第一定理证明了学习序列的收敛性、控制问题的良态性和代数 Riccati 方程的解的可解性三个属性的等价性。第二定理证明了在控制问题得到良态的前提下，学习序列的自适应反馈控制可以稳定系统。数值例子用于说明我们算法的可行性及有效性。