BriefGPT.xyz
Jul, 2024
Transformer层作为画家
Transformer Layers as Painters
HTML
PDF
Qi Sun, Marc Pickett, Aakash Kumar Nain, Llion Jones
TL;DR
探索事先训练的transformer模型中的层级作用,并揭示了跳过层级或并行运行层级可能会在精确度和延迟之间产生平衡。
Abstract
Despite their nearly universal adoption for large language models, the internal workings of
transformers
are not well understood. We aim to better understand the impact of removing or reorganizing information throughout the
→