BriefGPT.xyz
Feb, 2024
迈向超大规模Transformer的下一级后训练量化
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
HTML
PDF
Junhan Kim, Kyungphil Park, Chungman Lee, Ho-young Kim, Joonyoung Kim...
TL;DR
本文提出了一种新颖的PTQ算法aespa,通过逐层量化实现高效性,同时考虑跨层依赖以保留注意力分数,通过对多种语言模型的广泛实验和复杂度分析,证明了aespa在量化Transformer模型时具备准确性和高效性。
Abstract
With the increasing complexity of generative AI models,
post-training quantization
(PTQ) has emerged as a promising solution for deploying
hyper-scale models
on
→