BriefGPT.xyz
Jan, 2020
Reformer:高效Transformer
Reformer: The Efficient Transformer
HTML
PDF
Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya
TL;DR
本文介绍了两种技术以提高Transformer的效率:使用局部敏感哈希替换点积注意力和使用可逆残差层代替标准残差层,减少存储激活的次数。改进后的模型Reformer在处理长序列时比Transformer更加高效。
Abstract
Large
transformer
models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on
long sequences
. We introduce two techniques to improve
→