带有迭代运行时间边界的双树$k$均值

Jan, 2016

Dual-tree $k$-means with bounded iteration runtime

Ryan R. Curtin

TL;DR本文提出了一种双树算法，用于加速k-means聚类算法在大规模K簇和数据集下进行迭代，在使用了覆盖树后，该算法的单次迭代运行时间为O(N + k log k)，并且在实践中表现得很好。

Abstract

k-means is a widely used clustering algorithm, but for $k$ clusters and a dataset size of $N$, each iteration of lloyd's algorithm costs $