BriefGPT.xyz
May, 2024
使用线性计算图自动识别局部和全局电路
Automatically Identifying Local and Global Circuits with Linear Computation Graphs
HTML
PDF
Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He...
TL;DR
采用稀疏自编码器(SAEs)和跳跃SAEs为基础,引入电路发现流程,使用Hierarchical Attribution方法对于GPT2-Small模型分析了三种电路类型(括号电路、归纳电路和间接对象识别电路),揭示了现有发现之下的新发现。
Abstract
circuit analysis
of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with sparse autoencoders (SAEs) and a variant called
skip saes
. With t
→