TL;DR通过研究变压器在简单数据生成过程上的行为,我们探讨了词汇标记的理论视角,发现词汇标记对于变压器模型的训练是必要的,并验证了合适的词汇标记可以使变压器模型在学习 k 阶马尔可夫源的概率时达到近乎最优的结果。
Abstract
While there has been a large body of research attempting to circumvent
tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the
current consensus is that it is a necessary initial step for d