We analyze online \cite{BottouBengio} and mini-batch \cite{Sculley} $k$-means variants. Both scale up the widely used $k$-means algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that starting with any initial solution, they converge to a "local optimum" at rate $O(\frac{1}{t})$ (in terms of the $k$-means objective) under general conditions. In addition, we show if the dataset is clusterable, when initialized with a simple and scalable seeding algorithm, mini-batch $k$-means converges to an optimal $k$-means solution at rate $O(\frac{1}{t})$ with high probability. The $k$-means objective is non-convex and non-differentiable: we exploit ideas from recent work on stochastic gradient descent for non-convex problems \cite{ge:sgd_tensor, balsubramani13} by providing a novel characterization of the trajectory of $k$-means algorithm on its solution space, and circumvent the non-differentiability problem via geometric insights about $k$-means update.

该论文研究了在线学习和小批量k均值变体算法在大规模聚类和无监督特征学习中的应用，通过对算法的解空间轨迹的描述和对几何洞察的利用，克服了$k$-均值目标函数的非凸和不可微问题，并证明了在一般条件下，它们能以速率$O(rac{1}{t})$收敛到一个局部最优解，并且在可聚类数据集上，小批量$k$-means算法还可以在高概率下收敛到最优$k$-means解。

随机k-means的收敛速率