BriefGPT.xyz
May, 2022
基于种子的专家设计分类体系的分层聚类
Seeded Hierarchical Clustering for Expert-Crafted Taxonomies
HTML
PDF
Anish Saha, Amith Ananthram, Emily Allaway, Heng Ji, Kathleen McKeown
TL;DR
本文提出 HierSeed 方法,它是一种弱监督算法,使用少量标记示例将未标记数据自适应拟合到专家制定的分类法中。它通过权衡文档密度和主题分层结构来分配文档到主题,并在三个真实数据集上优于 SHC 任务的无监督和有监督基线。
Abstract
Practitioners from many disciplines (e.g., political science) use expert-crafted
taxonomies
to make sense of large, unlabeled corpora. In this work, we study
seeded hierarchical clustering
(SHC): the task of auto
→