BriefGPT.xyz
Mar, 2022
高精度和粗粒度混合自注意力机制用于高效BERT
Fine- and Coarse-Granularity Hybrid Self-Attention for Efficient BERT
HTML
PDF
Jing Zhao, Yifan Wang, Junwei Bao, Youzheng Wu, Xiaodong He
TL;DR
通过FCA算法实现对信息量有效的细粒度计算单元和无信息量的粗粒度计算单元的混合自注意力机制,提升了Transformer模型在计算上的效率,同时在多个自然语言处理任务中展现出与原模型相当的性能。
Abstract
transformer
-based
pre-trained models
, such as BERT, have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, deploying these models can be
→