A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions via subnetworks that can be composed to perform more complex tasks. Recent developments in mechanistic interpretability have made progress in identifying subnetworks, often referred to as circuits, which represent the minimal computational subgraph responsible for a model's behavior on specific tasks. However, most studies focus on identifying circuits for individual tasks without investigating how functionally similar circuits relate to each other. To address this gap, we examine the modularity of neural networks by analyzing circuits for highly compositional subtasks within a transformer-based language model. Specifically, given a probabilistic context-free grammar, we identify and compare circuits responsible for ten modular string-edit operations. Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness. Moreover, we demonstrate that the circuits identified can be reused and combined through subnetwork set operations to represent more complex functional capabilities of the model.

本研究解决了神经网络，特别是语言模型在复用功能上的能力不足的问题。通过分析变压器模型中高度组合的子任务电路，研究发现功能相似的电路具有显著的节点重叠和跨任务的信实性，这表明这些电路可以通过子网络集合运算进行重用和组合，从而表现出更复杂的功能能力。

电路组合：探索基于变压器的语言模型中的模块化结构