The disconnect between tokenizer creation and model training in language
models has been known to allow for certain inputs, such as the infamous
SolidGoldMagikarp token, to induce unwanted behaviour. Although such `glitch
tokens' that are present in the tokenizer vocabulary, but are nearly or fully
absent in training, have been observed across a variety of different models, a
consistent way of identifying them has been missing. We present a comprehensive
analysis of Large Language Model (LLM) tokenizers, specifically targeting this
issue of detecting untrained and under-trained tokens. Through a combination of
tokenizer analysis, model weight-based indicators, and prompting techniques, we
develop effective methods for automatically detecting these problematic tokens.
Our findings demonstrate the prevalence of such tokens across various models
and provide insights into improving the efficiency and safety of language
models.

我们通过分析 Tokenizer、基于模型权重的指标和启发式技术的结合，开发了一种有效的方法，用于自动检测在分词器词汇表中存在但在模型训练中很少或完全不存在的问题标记，我们的发现证明了这些标记在各种模型中的普遍存在性，并为改善语言模型的效率和安全性提供了启示。