BriefGPT.xyz
Jun, 2021
像 Transformer 一样思考
Thinking Like Transformers
HTML
PDF
Gail Weiss, Yoav Goldberg, Eran Yahav
TL;DR
本文提出了一种计算模型,将Transformer-encoder的基本组件attention和feed-forward计算映射到简单的原语中,并形成一种编程语言RASP,用于编程解决可能由Transformer学习的任务,并演示了如何训练Transformer模仿RASP解决方案,并且利用该模型分析了所需图层和注意头数。
Abstract
What is the
computational model
behind a
transformer
? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained
→