BriefGPT.xyz
Nov, 2022
字节级表示在语言建模中的应用
Word-Level Representation From Bytes For Language Modeling
HTML
PDF
Chu-Tak Lee, Qipeng Guo, Xipeng Qiu
TL;DR
该论文提出了一种新的方法Byte2Word,通过引入交叉注意力网络建立单词级别的表示,并基于单词级别的隐藏状态进行子词级别的预测,从而实现了更精简的输入嵌入方式,同时在语言模型和文本分类上表现出与强大的基准模型BERT相当的性能。
Abstract
Modern
language models
mostly take sub-words as input, a design that balances the trade-off between vocabulary size, number of parameters, and performance. However,
sub-word tokenization
still has disadvantages l
→