BriefGPT.xyz
Jun, 2020
GMAT:Transformer模型的全局记忆增强
GMAT: Global Memory Augmentation for Transformers
HTML
PDF
Ankit Gupta, Jonathan Berant
TL;DR
本论文提出在稀疏Transformer区块之外增加一个基于全局记忆的密集式注意力机制,以大大提高模型处理长文档时的效率和性能。
Abstract
transformer-based models
have become ubiquitous in
natural language processing
thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the
→