BriefGPT.xyz
Jun, 2022
nuQmm: 大规模生成式语言模型高效推理的量化矩阵乘法
nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
HTML
PDF
Gunho Park, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Youngjoo Lee...
TL;DR
该论文提出了一个有效的推理框架用于大规模生成式语言模型,采用自我监督学习和Transformer架构的最新进展实现了低困惑度,通过非均匀量化和nuQmm的加速矩阵乘法,实现了模型大小的减小,并减少了大型LM的推理延迟
Abstract
The recent advance of
self-supervised learning
associated with the
transformer architecture
enables natural language processing (
nlp
) to e
→