Transformer可以表示$n$-gram语言模型

Apr, 2024

Transformers Can Represent $n$-gram Language Models

Anej Svete, Ryan Cotterell

TL;DR该研究论文探讨了Transformer语言模型与n-gram语言模型之间的关系，通过分析机器学习模型的概率表示能力，提供了对Transformer语言模型代表概率分布的机制的初步认识。

Abstract

Plenty of existing work has analyzed the abilities of the transformer architecture by describing its representational capacity with formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend that this is an ill-suited problem in the study of \emph{→