BriefGPT.xyz
Nov, 2023
定位Transformer中的跨任务序列延续电路
Locating Cross-Task Sequence Continuation Circuits in Transformers
HTML
PDF
Michael Lan, Fazl Barez
TL;DR
通过对序列相似性任务的电路分析与比较,我们揭示了语义相关的序列依赖于具有类似角色的共享电路子图,并且共享计算结构的记录有助于更好地预测模型行为、识别错误和更安全的编辑过程,这对于构建更健壮、对齐和可解释的语言模型是一个关键步骤。
Abstract
While
transformer models
exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret. Recent work has aimed to reverse engineer
transformer models
into human-read
→