BriefGPT.xyz
Apr, 2024
大语言模型的高效推论综述
A Survey on Efficient Inference for Large Language Models
HTML
PDF
Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu...
TL;DR
大规模语言模型的有效推理需要克服模型规模大、注意力操作复杂度高、自回归解码等问题,本文对提高大规模语言模型推理效率的现有技术文献进行了综述,介绍了数据层、模型层和系统层优化的方法,并通过实验进行了定量分析,最后总结了相关知识,并探讨了未来研究方向。
Abstract
large language models
(LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of
llm inference
pose chall
→