There has been much progress on efficient algorithms for clustering data
points generated by a mixture of $k$ probability distributions under the
assumption that the means of the distributions are well-separated, i.e., the
distance between the means of any two distributions is at least
非监督学习中的聚类是一个基础问题,本研究介绍了一种简单的随机聚类算法,它在任意 k 下的期望运行时间为 O (nnz (X) + nlogn),并在 K-means 目标函数上实现了近似比例约为 O (k^4) 的算法,通过实验证明与现有方法相比,我们的聚类算法在运行时间和聚类质量之间有一个新的权衡。