BriefGPT.xyz
Jun, 2020
多分支注意力Transformer
Multi-branch Attentive Transformer
HTML
PDF
Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin...
TL;DR
本研究提出了一种名为多分支注意力Transformer(MAT)的变体,通过平均多个分支的注意力层,并使用两种训练技术——随机丢弃分支和相似初始化,对机器翻译、代码生成和自然语言理解等任务进行了实验,并取得了显著改善。
Abstract
While the
multi-branch architecture
is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in
natural language processing
, especially
→