BriefGPT.xyz
Apr, 2019
基于协议路由的多头注意力信息聚合
Information Aggregation for Multi-Head Attention with Routing-by-Agreement
HTML
PDF
Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu...
TL;DR
本研究旨在提高多头注意力机制信息聚合的表达能力,通过路由一致性算法来迭代地更新内部表示向最终表示的分配比例,实验结果表明,该改进算法比传统线性转换方法更优秀。
Abstract
multi-head attention
is appealing for its ability to jointly extract different types of information from multiple representation subspaces. Concerning the
information aggregation
, a common practice is to use a co
→