BriefGPT.xyz
Oct, 2022
大型模型是简明学习者:训练转换器中的激活稀疏性
Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers
HTML
PDF
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat...
TL;DR
通过实验证明机器学习模型的机制使得transformer架构的激活图稀疏化,进而提出一种可以显著降低计算量并提高效率的方式。
Abstract
This paper studies the curious phenomenon for
machine learning
models with
transformer architectures
that their
activation maps
are sparse
→