BriefGPT.xyz
Feb, 2024
预训练语言模型对标记的表面信息的认知
Knowledge of Pretrained Language Models on Surface Information of Tokens
HTML
PDF
Tatsuya Hiraoka, Naoaki Okazaki
TL;DR
预训练语言模型对于令牌的表面信息具有知识,包括令牌长度和子字符串。然而,对于令牌构成方面的知识,模型存在有效利用的瓶颈。
Abstract
Do
pretrained language models
have knowledge regarding the
surface information
of tokens? We examined the
surface information
stored in wo
→