BriefGPT.xyz
Apr, 2024
提高大型语言模型的推理效率:研究优化策略与架构创新
Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations
HTML
PDF
Georgy Tyukin
TL;DR
通过跳过Transformer LLMs中后面的attention子层,可以有效地对大型语言模型进行压缩,提升性能并降低计算成本。在Llama 2 7B上观察到21%的生成速度提升,并出乎意料地改善了在多个常见基准测试中的性能。
Abstract
large language models
are growing in size, and we expect them to continue to do so, as larger models train quicker. However, this increase in size will severely impact inference costs. Therefore
model compression
→