BriefGPT.xyz
Nov, 2019
通过重新排序子层来改善Transformer模型
Improving Transformer Models by Reordering their Sublayers
HTML
PDF
Ofir Press, Noah A. Smith, Omer Levy
TL;DR
研究了不同顺序的多层Transformer结构对性能的影响,提出了一种新的顺序——三明治变压器模型,并在多个语言模型基准测试中验证了其性能优势。
Abstract
multilayer transformer networks
consist of interleaved
self-attention
and
feedforward sublayers
. Could ordering the sublayers in a differe
→