We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with $n>1$ components learns from data that are generated by a single ground truth Gaussian distribution. While results for the special case of 2-Gaussian mixtures are well-known, a general global convergence analysis for arbitrary $n$ remains unresolved and faces several new technical barriers since the convergence becomes sub-linear and non-monotonic. To address these challenges, we construct a novel likelihood-based convergence analysis framework and rigorously prove that gradient EM converges globally with a sublinear rate $O(1/\sqrt{t})$. This is the first global convergence result for Gaussian mixtures with more than $2$ components. The sublinear convergence rate is due to the algorithmic nature of learning over-parameterized GMM with gradient EM. We also identify a new emerging technical challenge for learning general over-parameterized GMM: the existence of bad local regions that can trap gradient EM for an exponential number of steps.

在过参数化的设置中，我们研究了高斯混合模型（GMM）的梯度期望最大化（EM）算法，通过单个真实高斯分布生成的数据来学习具有 n > 1 个分量的一般 GMM。通过构建一个新的基于似然度的收敛性分析框架，我们严格证明了梯度 EM 以 sublinear 速率 O(1/√t) 具有全局收敛性，这是关于具有多于 2 个分量的高斯混合模型的首个全局收敛结果。子线性收敛速率是由于学习过参数化 GMM 的算法性质所导致的。我们还确定了学习一般过参数化 GMM 的新技术挑战：存在能够在指数步数内困住梯度 EM 的不良局部区域。

面向超参数化高斯混合模型梯度EM算法的全球收敛