BriefGPT.xyz
Jan, 2022
利用音节压缩词向量
Compressing Word Embeddings Using Syllables
HTML
PDF
Laurent Mertens, Joost Vennekens
TL;DR
本文研究使用音节嵌入代替常用的n-gram嵌入作为子词嵌入的可行性,并在英语和荷兰语中进行了探讨。与完整单词嵌入相比,我们的模型英语表现保留80%,大小为原来的20至30倍,荷兰语表现保留70%,大小为原来的15倍,并且能够在短时间内进行训练,但比n-gram基线方法略逊。
Abstract
This work examines the possibility of using
syllable embeddings
, instead of the often used $n$-gram embeddings, as subword embeddings. We investigate this for two languages:
english
and
→