BriefGPT.xyz
Sep, 2022
异常值抑制:推进低比特Transformer语言模型的极限
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
HTML
PDF
Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang Zhang...
TL;DR
本文提出了一种离群值抑制框架,包括 Gamma Migration 和 Token-Wise Clipping 两个组件,它们有效地压制了离群值,可直接使用,并将训练后 6 位 BERT 量化推到了全精度水平。
Abstract
transformer architecture
has become the fundamental element of the widespread
natural language processing
~(NLP) models. With the trends of large NLP models, the increasing memory and computation costs hinder thei
→