BriefGPT.xyz
Jun, 2024
添加前缀注意力节点以缓解大型语言模型量化中的激活值异常
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
HTML
PDF
Seungwoo Son, Wonpyo Park, Woohyun Han, Kyuyeun Kim, Jaeho Lee
TL;DR
提出了一种名为CushionCache的方法,通过防止生成问题标记,来促进每个张量的激活量化,成功解决了LLMs的激活离群值问题,并为每个张量的激活量化方法提供了显著的性能提升。
Abstract
Despite recent advances in
llm quantization
,
activation quantization
remains to be challenging due to the
activation outliers
. Conventiona
→