TL;DR我们提出了一种基于重复运行 LDA、聚类和提供主题稳定性度量的方法来增强主题模型的稳定性,并以 Mozilla Firefox 提交信息为例验证了其效果。
Abstract
Background: Unstructured and textual data is increasing rapidly and Latent
Dirichlet Allocation (LDA) topic modeling is a popular data analysis methods
for it. Past work suggests that instability of LDA topics may lead to
systematic errors. Aim: We propose a method that relies on repli