向量空间中多个嵌入每个单词的高效非参数估计

Apr, 2015

向量空间中多个嵌入每个单词的高效非参数估计

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, Andrew McCallum

TL;DR提出一种扩展Skip-gram模型的方法，它可以高效地学习每个单词类型的多个嵌入，通过联合进行词义辨别和嵌入学习，非参数地估计每个单词类型的很多不同的词义，并通过在一个拥有近10亿标记的语料库上训练一台机器的演示，展示了它的可扩展性。

Abstract

There is rising interest in vector-space word embeddings and their use in NLP, especially given recent methods for their fast estimation at very large scale. Nearly all this work, however, assumes a single vector