Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through a Chernoff divergence, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.

聚类是无监督机器学习中的关键问题，如何通过混合模型来研究聚类是常见的。本文首先通过契诺夫散度建立了聚类任何混合模型的一个普遍下界，然后证明在具有次指数尾部的混合模型中，迭代算法可以达到这个下界；此外，对于更适合使用泊松或负二项式混合模型的数据集，我们研究了属于指数族的混合模型，在这种混合模型中，我们证明了一种改进的Lloyd算法——Bregman硬聚类，是速率最优的。

在亚指数级混合模型中实现极小化极小聚类误差的通用下界和最优速率