BriefGPT.xyz
Sep, 2023
一种广泛前馈即可解决所有问题
One Wide Feedforward is All You Need
HTML
PDF
Telmo Pessoa Pires, António V. Lopes, Yannick Assogba, Hendra Setiawan
TL;DR
通过删除解码器层的FFN并在编码器中共享单个FFN,我们能够大幅减少参数数量,只有轻微的准确率下降,最终通过增加共享FFN的隐藏维度将这个架构恢复到原始尺寸,从而取得了在准确率和延迟方面的显著提升。
Abstract
The
transformer architecture
has two main non-embedding components:
attention
and the
feed forward network
(FFN).
→