The exploration of language skills in language models (LMs) has always been one of the central goals in mechanistic interpretability. However, existing circuit analyses often fall short in representing the full functional scope of these models, primarily due to the exclusion of Feed-Forward layers. Additionally, isolating the effect of a single language skill from a text, which inherently involves multiple entangled skills, poses a significant challenge. To address these gaps, we introduce a novel concept, Memory Circuit, a minimum unit that fully and independently manipulates the memory-reading functionality of a language model, and disentangle the transformer model precisely into a circuit graph which is an ensemble of paths connecting different memory circuits. Based on this disentanglement, we identify salient circuit paths, named as skill paths, responsible for three crucial language skills, i.e., the Previous Token Skill, Induction Skill and In-Context Learning (ICL) Skill, leveraging causal effect estimation through interventions and counterfactuals. Our experiments on various datasets confirm the correspondence between our identified skill paths and language skills, and validate three longstanding hypotheses: 1) Language skills are identifiable through circuit dissection; 2) Simple language skills reside in shallow layers, whereas complex language skills are found in deeper layers; 3) Complex language skills are formed on top of simpler language skills. Our codes are available at: https://github.com/Zodiark-ch/Language-Skill-of-LLMs.

本研究针对现有电路分析无法全面表征语言模型（LM）功能的不足，提出了“记忆电路”的新概念，以独立操控语言模型的记忆读取功能。实验表明，我们识别的技能路径与语言技能相对应，验证了语言技能可通过电路解剖识别的假设，揭示了浅层次与深层次语言技能的分布，并表明复杂技能基于简单技能之上形成。

揭示电路下的语言技能