BriefGPT.xyz
Aug, 2024
金鱼:适用于350种语言的单语语言模型
Goldfish: Monolingual Language Models for 350 Languages
HTML
PDF
Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen
TL;DR
本研究针对低资源语言中的现有多语言模型表现不佳的问题,提出了金鱼这一新颖的单语自回归Transformer语言模型系列,支持多达350种语言。尽管金鱼模型的参数数量较小,但在98种语言的FLORES困惑度指标上优于现有大型多语言模型。本研究为低资源NLP研究提供了有效的基准模型和微调来源,促进了该领域的进一步发展。
Abstract
For many
Low-resource Languages
, the only available language models are large multilingual models trained on many languages simultaneously. However, using
FLORES Perplexity
as a metric, we find that these models
→