Current decoder-based pre-trained language models (PLMs) successfully
demonstrate multilingual capabilities. However, it is unclear how these models
handle multilingualism. We analyze the neuron-level internal behavior of
multilingual decoder-based PLMs, Specifically examining the existence of
neurons that fire ``uniquely for each language'' within decoder-only
multilingual PLMs. We analyze six languages: English, German, French, Spanish,
Chinese, and Japanese, and show that language-specific neurons are unique, with
a slight overlap (< 5%) between languages. These neurons are mainly distributed
in the models' first and last few layers. This trend remains consistent across
languages and models. Additionally, we tamper with less than 1% of the total
neurons in each model during inference and demonstrate that tampering with a
few language-specific neurons drastically changes the probability of target
language occurrence in text generation.

当前基于解码器的预训练语言模型（PLMs）成功展示了多语言能力，但这些模型如何处理多语言仍不清楚。我们分析了多语言解码器 PLMs 的神经元级内部行为，特别是考察解码器 - 仅多语言 PLMs 内部是否存在 “独特地只为每种语言” 激活的神经元。我们分析了六种语言：英语、德语、法语、西班牙语、中文和日语，并显示每种语言的语言特定神经元是唯一的，在不同语言之间存在轻微的重叠（<5%）。这些神经元主要分布在模型的前几层和最后几层。此趋势在所有语言和模型中始终一致。此外，在推断过程中，我们对每个模型中少于 1% 的神经元进行干扰，并展示了对少数语言特定神经元的干扰会大幅改变生成文本中目标语言发生的概率。