BriefGPT.xyz
May, 2025
通过稀疏自编码器揭示大型语言模型中的语言特征
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
HTML
PDF
Boyi Deng, Yu Wan, Yidan Zhang, Baosong Yang, Fuli Feng
TL;DR
本研究针对大型语言模型在多语言能力方面的机制,提出了使用稀疏自编码器(SAE)作为更可靠的分析工具,从而克服传统方法的局限性。研究发现,从SAE获得的特征与特定语言密切相关,通过选择性去除这些特征,能够显著改善大型语言模型的语言控制能力。
Abstract
The mechanisms behind
Multilingual Capabilities
in
Large Language Models
(LLMs) have been examined using neuron-based or internal-activation-based methods. However, these methods often face challenges such as sup
→