使用线性计算图自动识别局部和全局电路

May, 2024

使用线性计算图自动识别局部和全局电路

Automatically Identifying Local and Global Circuits with Linear Computation Graphs

Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He...

TL;DR采用稀疏自编码器（SAEs）和跳跃SAEs为基础，引入电路发现流程，使用Hierarchical Attribution方法对于GPT2-Small模型分析了三种电路类型（括号电路、归纳电路和间接对象识别电路），揭示了现有发现之下的新发现。

Abstract

circuit analysis of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with sparse autoencoders (SAEs) and a variant called skip saes. With t